Abstract only:
Today’s data analysts and modellers are in the luxurious position of being able to more closely describe, estimate, predict and infer about complex systems of interest, thanks to ever more powerful computational methods but also wider ranges of modelling distributions. Mixture models constitute a fascinating illustration of these aspects: while within a parametric family, they offer malleable approximations in non-parametric settings; although based on standard distributions, they pose highly complex computational challenges; and they are both easy to constrain to meet identifiability requirements and fall within the class of ill-posed problems. They also provide an endless benchmark for assessing new techniques, from the EM algorithm to reversible jump methodology. In particular, they exemplify the formidable opportunity provided by new computational technologies like Markov chain Monte Carlo (MCMC) algorithms. It is no coincidence that the Gibbs sampling algorithm for the estimation of mixtures was proposed before (Tanner and Wong 1987) and immediately after (Diebolt and Robert 1990c) the seminal paper of Gelfand and Smith (1990): before MCMC was popularised, there simply was no satisfactory approach to the computation of Bayes estimators for mixtures of distributions, even though older importance sampling algorithms were later discovered to apply to the simulation of posterior distributions of mixture parameters (Casella et al. 2002).
Bayesian approaches to mixture modelling have attracted great interest among researchers and practitioners alike. The Bayesian paradigm (Berger 1985, Besag et al. 1995, Robert 2001, see, e.g.,) allows for probability statements to be made directly about the unknown parameters, prior or expert opinion to be included in the analysis, and hierarchical descriptions of both local-scale and global features of the model. This framework also allows the complicated structure of a mixture model to be decomposed into a set of simpler structures through the use of hidden or latent variables. When the number of components is unknown, it can well be argued that the Bayesian paradigm is the only sensible approach to its estimation (Richardson and Green 1997).
This chapter aims to introduce the reader to the construction, prior modelling, estimation and evaluation of mixture distributions in a Bayesian paradigm. We will show that mixture distributions provide a flexible, parametricframework for statistical modelling and analysis. Focus is on methods rather than advanced examples, in the hope that an understanding of the practical aspects of such modelling can be carried into many disciplines. It also stresses implementation via specific MCMC algorithms that can be easily reproduced by the reader. In Section 1.2, we detail some basic properties of mixtures, along with two different motivations. Section 1.3 points out the fundamental difficulty in doing inference with such objects, along with a discussion about prior modelling, which is more restrictive than usual, and the constructions of estimators, which also is more involved than the standard posterior mean solution. Section 1.4 describes the completion and non-completion MCMC algorithms that can be used for the approximation to the posterior distribution on mixture parameters, followed by an extension of this analysis in Section 1.5 to the case in which the number of components is unknown and may be estimated by Green's (1995) reversible jump algorithm and Stephens' 2000 birth-and-death procedure.
Section 1.6 gives some pointers to related models and problems like mixtures of regressions (or conditional mixtures) and hidden Markov models (or dependent mixtures), as well as Dirichlet priors.