Ecological inference for 2 × 2 tables
A fundamental problem in many disciplines, including political science, sociology and epidemiology, is the examination of the association between two binary variables across a series of 2 × 2 tables, when only the margins are observed, and one of the margins is fixed. Two unobserved fractions are of interest, with only a single response per table, and it is this non-identifiability that is the inherent difficulty lying at the heart of ecological inference. Many methods have been suggested for ecological inference, often without a probabilistic model; we clarify the form of the sampling distribution and critique previous approaches within a formal statistical framework, thus allowing clarification and examination of the assumptions that are required under all approaches. A particularly difficult problem is choosing between models with and without contextual effects. Various Bayesian hierarchical modelling approaches are proposed to allow the formal inclusion of supplementary data, and/or prior information, without which ecological inference is unreliable. Careful choice of the prior within such models is required, however, since there may be considerable sensitivity to this choice, even when the model assumed is correct and there are no contextual effects. This sensitivity is shown to be a function of the number of areas and the distribution of the proportions in the fixed margin across areas. By explicitly providing a likelihood for each table, the combination of individual level survey data and aggregate level data is straightforward and we illustrate that survey data can be highly informative, particularly if these data are from a survey of the minority population within each area. This strategy is related to designs that are used in survey sampling and in epidemiology. An approximation to the suggested likelihood is discussed, and various computational approaches are described. Some extensions are outlined including the consideration of multiway tables, spatial dependence and area-specific (contextual) variables. Voter registration-race data from 64 counties in the US state of Louisiana are used to illustrate the methods. Copyright 2004 Royal Statistical Society.
Year of publication: |
2004
|
---|---|
Authors: | Wakefield, Jon |
Published in: |
Journal of the Royal Statistical Society Series A. - Royal Statistical Society - RSS, ISSN 0964-1998. - Vol. 167.2004, 3, p. 385-425
|
Publisher: |
Royal Statistical Society - RSS |
Saved in:
freely available
Saved in favorites
Similar items by person
-
Statistical Analysis of Environmental Space-Time Processes edited by N. D. Le and J. V. Zidek
Wakefield, Jon, (2007)
-
Bayesian Methods for Examining Hardy–Weinberg Equilibrium
Wakefield, Jon, (2010)
-
Errors-in-Variables in Joint Population Pharmacokinetic/Pharmacodynamic Modeling
Bennett, James, (2001)
- More ...