Causal inference with two-stage logistic regression -accuracy, precision, and application
Two-stage predictor substitution (2SPS) and the two-stage residual inclusion (2SRI) are two approaches to instrumental variable (IV) analysis. While 2SPS and 2SRI with linear models are well-studied methods of causal inference, the properties of 2SPS and 2SRI for logistic binary outcomes have not been thoroughly studied. We study the bias and variance properties of 2SPS and 2SRI for a logistic outcome model so that we can apply these IV approaches to the causal inference of binary outcomes. We also propose and implement an extension of generalized structure mean model originally developed for a randomized trial. We first present closed form expressions of asymptotic bias for the causal odds ratio from both 2SPS and 2SRI approaches. Our closed form bias results show that the 2SPS logistic regression generates asymptotically biased estimates of this causal odds ratio when there is no unmeasured confounding and that this bias increases with increasing unmeasured confounding. The 2SRI logistic regression is asymptotically unbiased when there is no unmeasured confounding, but when there is unmeasured confounding, there is bias and it increases with increasing unmeasured confounding. In the second part, we propose the sandwich variance estimator of logistic regression of both 2SPS and 2SRI approaches and the variance estimator is adjusted for the fact that the estimates from the first stage regression is included as covariates in the second stage regression. The simulation results show that the adjusted estimates are consistent with the observed variance while the naive estimates without the adjustments are biased. This study also shows that the 2SRI method has a larger variance than the 2SPS method. Lastly, we compare the 2SPS and 2SRI logistic regression with the generalized structure mean model (GSMM). Our simulation results show that the GSMM is an unbiased estimator of complier-average causal effect (CACE) and has the least variance among the three approaches. We apply these three methods to the analysis of the GPRD database on antidiabetic effect of bezafibrate.