A boosting approach for adapting the sparsity of risk prediction signatures based on different molecular levels
Risk prediction models can link high-dimensional molecular measurements, such as DNA methylation, to clinical endpoints. For biological interpretation, often a sparse fit is desirable. Different molecular aggregation levels, such as considering DNA methylation at the CpG, gene, or chromosome level, might demand different degrees of sparsity. Hence, model building and estimation techniques should be able to adapt their sparsity according to the setting. Additionally, underestimation of coefficients, which is a typical problem of sparse techniques, should also be addressed. We propose a comprehensive approach, based on a boosting technique that allows a flexible adaptation of model sparsity and addresses these problems in an integrative way. The main motivation is to have an automatic sparsity adaptation. In a simulation study, we show that this approach reduces underestimation in sparse settings and selects more adequate model sizes than the corresponding non-adaptive boosting technique in non-sparse settings. Using different aggregation levels of DNA methylation data from a study in kidney carcinoma patients, we illustrate how automatically selected values of the sparsity tuning parameter can reflect the underlying structure of the data. In addition to that, prediction performance and variable selection stability is compared to the non-adaptive boosting approach.
Year of publication: |
2014
|
---|---|
Authors: | Murat, Sariyar ; Martin, Schumacher ; Harald, Binder |
Published in: |
Statistical Applications in Genetics and Molecular Biology. - De Gruyter, ISSN 1544-6115. - Vol. 13.2014, 3, p. 15-15
|
Publisher: |
De Gruyter |
Saved in:
Online Resource
Saved in favorites
Similar items by person
-
Cluster-Localized Sparse Logistic Regression for SNP Data
Harald, Binder, (2012)
-
Harald, Binder, (2008)
- More ...