Imputing missing genotypes with weighted k nearest neighbors
Motivation: Missing values are a common problem in genetic association studies concerned with single nucleotide polymorphisms (SNPs). Since most statistical methods cannot handle missing values, they have to be removed prior to the actual analysis. Considering only complete observations, however, often leads to an immense loss of information. Therefore, procedures are needed that can be used to replace such missing values. In this article, we propose a method based on weighted k nearest neighbors that can be employed for imputing such missing genotypes. Results: In a comparison to other imputation approaches, our procedure called KNNcatImpute shows the lowest rates of falsely imputed genotypes when applied to the SNP data from the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions associated with sporadic breast cancer. Moreover, in contrast to other imputation methods that take all variables into account when replacing missing values of a particular variable, KNNcatImpute is not restricted to association studies comprising several ten to a few hundred SNPs, but can also be applied to data from whole-genome studies, as an application to a subset of the HapMap data shows.
Year of publication: |
2008
|
---|---|
Authors: | Schwender, Holger ; Ickstadt, Katja |
Institutions: | Institut für Wirtschafts- und Sozialstatistik, Universität Dortmund |
Saved in:
freely available
Saved in favorites
Similar items by person
-
Detecting high-order interactions of single nucleotide polymorphisms using genetic programming
Nunkesser, Robin, (2007)
-
Identification of SNP interactions using logic regression
Schwender, Holger, (2006)
-
Comparison of the empirical bayes and the significance analysis of microarrays
Schwender, Holger, (2003)
- More ...