Welcome to my academic page

I am PhD candidate (ABD) in political science at UIUC, my research interests are in statistical and computational methods, and substantively in comparative and international political economy. I have written research papers on boostrap standard errors for matching estimators, model evaluation metrics for binary classification tasks, the contextual and temporal variation of individuals’ attitudes towards globalization, A Lasso-agumented alternative to synthetic control method: estimating the effect of Hong Kong’s sovereignty return to China, meta-analysis as a multiple hypothesis testing problem, etc. Please feel free to reach out.

Research Paper Abstracts

Paper 1: Are we bootstrapping the right thing? A new approach to quantify uncertainty of ATT Estimates

Abstract: This paper proposes a new non-parametric bootstrap method to quantify the uncertainty of average treatment effect estimates from matching estimators. More specifically, it seeks to quantify the uncertainty associated with the treatment effect variation for the treated group by bootstrapping the treatment group only and finding the counterpart control group by pair-matching on estimated propensity score. We demonstrate the validity of this approach and compare it with the existing approaches through Monte Carlo simulation and real world example data. The results indicate that the proposed approach achieves comparable precision and coverage rate but more precise standard error than existing bootstrap approaches. (Link)

Paper 2: Data Imbalance and the Use of Binary Classification Performance Metrics

Abstract: This paper examines a critical question in the application of machine learning models in the social sciences, namely the use of model evaluation metrics for binary classification tasks. More specifically, it investigates how sample data imbalance as measured in prevalence level affects the values of various confusion matrix performance metrics including TPR, TNR, PPV, NPV, balanced accuracy, Bookmaker informedness, F1 score and Matthew’s correlation coefficient as well as the commonly used accuracy and Area Under the ROC Curve measures. The results indicate that the accuracy measure is dominated by the majority class as data become more imbalanced. The balanced accuracy (as well as bookmaker informedness) and Matthew’s correlation coefficient take model performance on both classes of cases into account. The F1 score, meanwhile, has a monotonically increasing relationship with prevalence level. These trends of the confusion matrix performance metrics hold regardless of which specific type of model is used for making predictions. The results have significant implications for applications of machine learning models in the field. (Link)