Identification, data combination and the risk of disclosure
Businesses routinely rely on econometric models to analyze and predict consumer behavior. Estimation of such models may require combining a firm's internal data with external datasets to take into account sample selection, missing observations, omitted variables and errors in measurement within the existing data source. In this paper we point out that these data problems can be addressed when estimating econometric models from combined data using the data mining techniques under mild assumptions regarding the data distribution. However, data combination leads to serious threats to security of consumer data: we demonstrate that point identification of an econometric model from combined data is incompatible with restrictions on the risk of individual disclosure. Consequently, if a consumer model is point identified, the firm would (implicitly or explicitly) reveal the identity of at least some of consumers in its internal data. More importantly, we provide an argument that unless the firm places a restriction on the individual disclosure risk when combining data, even if the raw combined dataset is not shared with a third party, an adversary or a competitor can gather confidential information regarding some individuals from the estimated model.
Year of publication: |
2011-12
|
---|---|
Authors: | Komarova, Tatiana ; Nekipelov, Denis ; Yakovlev, Evgeny |
Institutions: | Centre for Microdata Methods and Practice (CEMMAP) |
Saved in:
freely available
Saved in favorites
Similar items by person
-
Estimation of treatment effects from combined data : identification versus data security
Komarova, Tatiana, (2015)
-
Identification, data combination, and the risk of disclosure
Komarova, Tatiana, (2018)
-
Identification, data combination and the risk of disclosure
Komarova, Tatiana, (2011)
- More ...