Distribution-preserving statistical disclosure limitation
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences based on the partially synthetic data, because the imputation model determines the distribution of synthetic values. We present a practical method to generate synthetic values when the imputer has only limited information about the true data generating process. We combine a simple imputation model (such as regression) with density-based transformations that preserve the distribution of the confidential data, up to sampling error, on specified subdomains. We demonstrate through simulations and a large scale application that our approach preserves important statistical properties of the confidential data, including higher moments, with low disclosure risk.
Year of publication: |
2009
|
---|---|
Authors: | Woodcock, Simon D. ; Benedetto, Gary |
Published in: |
Computational Statistics & Data Analysis. - Elsevier, ISSN 0167-9473. - Vol. 53.2009, 12, p. 4228-4242
|
Publisher: |
Elsevier |
Saved in:
Online Resource
Saved in favorites
Similar items by person
-
Distribution-Preserving Statistical Disclosure Limitation
Woodcock, Simon D., (2007)
-
Distribution Preserving Statistical Disclosure Limitation
Woodcock, Simon D., (2006)
-
Distribution-Preserving Statistical Disclosure Limitation
Woodcock, Simon D., (2007)
- More ...