Title: Support Vector Machines with L1 penalty for detecting gene-gene interactions
Authors: Yuanyuan Shen; Zhe Liu; Jurg Ott
Addresses: Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA ' Department of Statistics, University of Chicago, 5734 S. University Avenue, Chicago, IL 60637, USA ' Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, 4A Datun Road, Beijing 100101, China
Abstract: Interactions among genetic variants are likely to affect risk for human complex diseases, and their identification should increase the power to detect disease-associated variants and elucidate biological pathways underlying diseases. We propose a two-stage approach: 1) model selection with Support Vector Machines identifies the most promising Single Nucleotide Polymorphisms and interactions; 2) logistic regression ensures a valid type I error by excluding non-significant candidates after Bonferroni correction. Simulation studies for case-control data suggest that our method powerfully detects gene-gene interactions. We analyze a published genome-wide case-control dataset, where our method successfully identifies an interaction term, which was missed in previous studies.
Keywords: genome-wide association study; GWAS; human diseases; complex diseases; gene-gene interactions; SVM; support vector machine; model selection; L1 penalty; two-stage method; data mining; bioinformatics; simulation.
DOI: 10.1504/IJDMB.2012.049300
International Journal of Data Mining and Bioinformatics, 2012 Vol.6 No.5, pp.463 - 470
Received: 03 May 2011
Accepted: 04 May 2011
Published online: 17 Dec 2014 *