How Well Do Automated Methods Perform in Historical Samples? Evidence from New Ground Truth
Martha Bailey, Connor Cole, Morgan Henderson, Catherine Massey
New large-scale data linking projects are revolutionizing empirical social science. Outside of selected samples and tightly restricted data enclaves, little is known about the quality of these "big data" or how the methods used to create them shape inferences. This paper evaluates the performance of commonly used automated record-linking algorithms in three high quality historical U.S. samples. Our findings show that (1) no method (including hand linking) consistently produces samples representative of the linkable population; (2) automated linking tends to produce very high rates of false matches, averaging around one third of links across datasets and methods; and (3) false links are systematically (though differently) related to baseline sample characteristics. A final exercise demonstrates the importance of these findings for inferences using linked data. For a common set of records, we show that algorithm assumptions can attenuate estimates of intergenerational income elasticities by almost 50 percent. Although differences in these findings across samples and methods caution against the generalizability of specific error rates, common patterns across multiple datasets offer broad lessons for improving current linking practice
Year of publication: |
November 2017
|
---|---|
Authors: | Bailey, Martha |
Other Persons: | Cole, Connor (contributor) ; Massey, Catherine (contributor) ; Henderson, Morgan (contributor) |
Institutions: | National Bureau of Economic Research (contributor) |
Publisher: |
Cambridge, Mass : National Bureau of Economic Research |
Saved in:
freely available
Extent: | 1 Online-Ressource |
---|---|
Series: | NBER working paper series ; no. w24019 |
Type of publication: | Book / Working Paper |
Language: | English |
Notes: | Mode of access: World Wide Web System requirements: Adobe [Acrobat] Reader required for PDF files Hardcopy version available to institutional subscribers. |
Other identifiers: | 10.3386/w24019 [DOI] |
Source: | ECONIS - Online Catalogue of the ZBW |
Persistent link: https://www.econbiz.de/10012453694
Saved in favorites
Similar items by person
-
How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data
Bailey, Martha J., (2017)
-
How well do automated methods perform in historical samples? : evidence from new ground truth
Bailey, Martha J., (2017)
-
How well do automated linking methods perform? : lessons from US historical data
Bailey, Martha J., (2020)
- More ...