Distribution-preserving statistical disclosure limitation

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences based on the partially synthetic data, because the imputation model determines the distribution of synthetic values. We present a practical method to generate synthetic values when the imputer has only limited information about the true data generating process. We combine a simple imputation model (such as regression) with density-based transformations that preserve the distribution of the confidential data, up to sampling error, on specified subdomains. We demonstrate through simulations and a large scale application that our approach preserves important statistical properties of the confidential data, including higher moments, with low disclosure risk.

MoreLess

Year of publication:	2009
Authors:	Woodcock, Simon D. ; Benedetto, Gary
Published in:	Computational Statistics & Data Analysis. - Elsevier, ISSN 0167-9473. - Vol. 53.2009, 12, p. 4228-4242
Publisher:	Elsevier

Online Resource

Check full text access |

More access options

Check Google Scholar

In libraries world-wide (WorldCat)

In German libraries (KVK)

subito order

I need help

More details

Type of publication:	Article
Source:	RePEc - Research Papers in Economics

Persistent link: https://www.econbiz.de/10005006059