An Algorithm for Creating Models for Imputation Using the MICE Approach: An Application in Stata
It is generally advised that imputation models contain as many “predictor” variables as possible, since the greater the number of variables the greater the amount of information from which to make estimations (van Buuren, Boshuizen & Knook 1999). Ideally, an imputation model might contain all variables in the dataset. Hence, the default in software packages that perform multivariate imputation by chained equations (e.g. ice in Stata) is often to use all other variables in the imputation model to predict missing values. However, in datasets with moderate to large numbers of variables, attempting to use all other variables in the dataset results in imputation models that are too large to actually run. One solution to this problem is to select a relatively large, but reasonable, number of predictors based on bivariate correlations and then drop predictors as necessary to create a regression model that is tractable using the complete data. This set of regression models form the imputation model for the entire dataset. This presentation outlines this approach in more detail and presents an overview of the Stata package that implements it.
Year of publication: |
2007-10-31
|
---|---|
Authors: | Medeiros, Rose |
Institutions: | Stata User Group |
Saved in:
freely available
Saved in favorites
Similar items by person
-
Using Regular Expressions for Data Management in Stata
Medeiros, Rose, (2007)
-
Likelihood Ratio Tests for Multiply Imputed Datasets: Introducing milrtest
Medeiros, Rose, (2008)
-
Nonstandard Deviation: Making the Global Local
Pagano, Marcello, (2014)
- More ...