STATISTICS Seminars Series: Chamika Porage
- Date
- 17 December 2025, 10:15–11:30
- Location
- Ekonomikum, H317
- Type
- Seminar
- Organiser
- Department of Statistics
Speaker Chamika Porage, Department of Statistics Uppsala University
Opponent Mattias Nordin, Department of Statistics, Uppsala University
Topic Evaluating model misspecification in prognostic score based average treatment effect estimation
Abstract Accurate estimation of causal effects in observational studies requires methods to account for confounding, particularly when models are subject to misspecification. Prognostic scores, which are related to outcome regression, have been introduced as an alternative to propensity scores for the estimation of the treatment effect. This study examines the performance of average treatment effect estimators based on prognostic scores and full prognostic scores (FPGS). We use various modeling approaches, including regression imputation and matching using both parametric and non-parametric techniques. Through simulation studies, we assess how model misspecification, sample size, and choice of estimator impact bias, standard error, and mean squared error of the estimator. Our findings indicate that, under correct model specification, parametric regression imputation estimators based on ordinary least squares outcome regression produce lower bias and mean squared error than regression imputation estimators based on random forest regression. In contrast, parametric regression imputation estimators exhibit a substantial increase in bias and mean squared error when the outcome regression is misspecified. For matching estimators, performance depends critically on the choice of adjustment score, particularly in the presence of heterogeneous treatment effects, where prognostic score-based matching may remain biased even when the outcome regression is correctly specified and FPGS-based matching achieves lower bias and improved mean squared error. While parametric methods perform well under correct specification, non-parametric approaches provide some flexibility against misspecification but require larger samples to achieve stability. When comparing prognostic score and FPGS-based methods, the results suggest that FPGS-based estimators may offer advantages in most cases, for example, when using regression imputation and matching estimators in the presence of heterogeneous treatment effects. To demonstrate the investigated estimators, we perform analysis in an empirical study, using data from the 2017-2018 National Health and Nutrition Examination Survey (NHANES) to investigate the effect of smoking on blood lead levels by comparing results from a correctly specified model and a misspecified model.