Title: Expression quantitative locus mapping for identification of hotspots using an empirical Bayes mixture model
Authors: Guanglong Jiang; Yingqiang Fu; Pengyue Zhang; Shirin Ardeshir-Rouhani-Fard; Lijun Cheng; Lang Li; Zhigao Li
Addresses: Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA ' Harbin Medical University Cancer Hospital, Harbin, Heilongjiang, 150081, China ' Department of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA ' Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA ' Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA ' Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA ' Harbin Medical University Cancer Hospital, Harbin, Heilongjiang, 150081, China
Abstract: Identification of genomic regions that regulate gene expression can help our understanding of the mechanisms underlying genetic contributions to phenotypic variations. Hence, we consider a mixture model to locate candidate genomic regions that are more frequently associated with gene expression traits. A modified two-sample t-statistic was used, and single-nucleotide polymorphisms (SNPs) with P-values <10-5 were considered for a subsequent two-component negative binomial mixture model. An expectation-maximisation algorithm was adopted to identify the parameters involved in the model. The SNPs were then ranked based on their false discovery rate (FDR) values. Any SNP with a FDR value <1% was considered as a potential hotspot. Three independent datasets were used to replicate the findings. A number of common hotspots were identified, and many hotspots have annotated function as the binding site of transcription factors or histone proteins.
Keywords: genotype; gene expression; expression quantitative trait loci; genome-wide association studies; empirical Bayes; mixture model; transcription factor.
DOI: 10.1504/IJCBDD.2017.083882
International Journal of Computational Biology and Drug Design, 2017 Vol.10 No.2, pp.108 - 122
Received: 15 Aug 2016
Accepted: 19 Sep 2016
Published online: 25 Apr 2017 *