Imputing missing genotypes with weighted k nearest neighbors

Motivation: Missing values are a common problem in genetic association studies concerned with single nucleotide polymorphisms (SNPs). Since most statistical methods cannot handle missing values, they have to be removed prior to the actual analysis. Considering only complete observations, however, often leads to an immense loss of information. Therefore, procedures are needed that can be used to replace such missing values. In this article, we propose a method based on weighted k nearest neighbors that can be employed for imputing such missing genotypes. Results: In a comparison to other imputation approaches, our procedure called KNNcatImpute shows the lowest rates of falsely imputed genotypes when applied to the SNP data from the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions associated with sporadic breast cancer. Moreover, in contrast to other imputation methods that take all variables into account when replacing missing values of a particular variable, KNNcatImpute is not restricted to association studies comprising several ten to a few hundred SNPs, but can also be applied to data from whole-genome studies, as an application to a subset of the HapMap data shows.

MoreLess

Year of publication:	2008
Authors:	Schwender, Holger ; Ickstadt, Katja
Institutions:	Institut für Wirtschafts- und Sozialstatistik, Universität Dortmund

freely available

Full text |

More access options

Check Google Scholar

In German libraries (KVK)

I need help

More details

Extent:	application/pdf
Series:	Technical Reports.
Type of publication:	Book / Working Paper
Language:	English
Notes:	Number 2008,03
Source:	RePEc - Research Papers in Economics

Persistent link: https://www.econbiz.de/10009216957