Welcome to my academic page
I am PhD candidate (ABD) in political science at UIUC, my research interests are in the use of computational and statistical methods to study important substantive questions related to mass political behavior and political economy in the American and comparative contexts. I have done research projects on dynamic modelling of mass partisan polarization and segregation, machine learning model evaluation for binary classification tasks, the contextual and temporal variation of individuals’ attitudes towards globalization, A Lasso-augmented time series alternative to the synthetic control method, bootstrap standard errors for matching estimators, meta-analysis as a multiple hypothesis testing problem, etc. I am on the academic job markert.
Selected Working Paper Abstracts
Paper 1: A dynamic model of polarization under segregation (Preparing for submission)
Abstract: Mass partisan polarization has increasingly become a phenomenon of great concern in the US and other industrialized democracies in recent decades, affecting millions of people. Existing research that looks at mass partisan polarization and geography in conjunction focuses exclusively on the extent of partisan geographical sorting but neglects the substantial effect geographical context can have on the formation of mass partisan polarization. More specifically, constructive intergroup contact in less segregated geographically contexts can significantly mitigate the mass partisan ideological polarization process. We provide evidence for this hypothesis through a computational model and empirical analysis of county-level presidential election results from 1980 to 2020. Both computational modeling and empirical analysis results provide strong support for the argument that geographical context significantly conditions the mass partisan polarization process and higher geographical segregation corresponds to faster convergence to partisan polarization. These results provide new insights on relationships between geography and mass partisan ideological polarization. (Polmeth Poster).
Paper 2: Are we bootstrapping the right thing? A new approach to quantify uncertainty of ATT Estimates (Under Review)
Abstract: This paper proposes a new non-parametric bootstrap method to quantify the uncertainty of average treatment effect estimate for the treated from matching estimators. More specifically, it seeks to quantify the uncertainty associated with the average treatment effect estimate for the treated by bootstrapping the treatment group only and finding the counterpart control group by pair matching on estimated propensity score without replacement. We demonstrate the validity of this approach and compare it with existing bootstrap approaches through Monte Carlo simulation and analysis of a real world data set. The results indicate that the proposed approach constructs confidence intervals and standard errors that have 95 percent or above coverage rate and better precision compared with existing bootstrap approaches, while these measures also depend on percent treated in the sample data and the sample size. (arxiv Preprint).
Paper 3: Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification (Revise and Resubmit)
Abstract: Evaluation Metrics is an important question for model evaluation and model selection in binary classification tasks. This study investigates how consistent metrics are at evaluating different models under different data scenarios. Analyzing over 150 data scenarios and 18 model evaluation metrics using statistical simulation, I find that for binary classification tasks, evaluation metrics that are less influenced by prevalence offer more consistent ranking of a set of different models. In particular, Area Under the ROC Curve (AUC) has smallest variance in ranking of different models. Matthew’s correlation coefficient as a more strict measure of model performance has the second smallest variance. These patterns holds across a rich set of data scenarios and five commonly used machine learning models as well as a naive random guess model. The results have significant implications for model evaluation and model selection in binary classification tasks. (arxiv Preprint)