Title: Effects of input data quantity on genome-wide association studies (GWAS)
Authors: Yan Yan; Connor Burbridge; Jinhong Shi; Juxin Liu; Anthony Kusalik
Addresses: Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, SK, S7N 5C9 Canada ' Global Institute for Food Security, University of Saskatchewan, 110 Gymnasium Place, Saskatoon, SK, S7N 0W9 Canada ' Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, SK, S7N 5C9 Canada ' Department of Mathematics and Statistics, University of Saskatchewan, McLean Hall, Saskatoon, SK, S7N 5E6 Canada ' Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, SK, S7N 5C9 Canada
Abstract: Many software packages have been developed for Genome-Wide Association Studies (GWAS) based on various statistical models. One key factor influencing the statistical reliability of GWAS is the amount of input data used. In this paper, we investigate how input data quantity influences output of four widely used GWAS programs, PLINK, TASSEL, GAPIT, and FaST-LMM, in the context of plant genomes and phenotypes. Both synthetic and real data are used. Evaluation is based on p- and q-values of output SNPs, and Kendall rank correlation between output SNP lists. Results show that for the same GWAS program, different Arabidopsis thaliana datasets demonstrate similar trends of rank correlation with varied input quantity, but differentiate on the numbers of SNPs passing a given p- or q-value threshold. We also show that variations in numbers of replicates influence the p-values of SNPs, but do not strongly affect the rank correlation.
Keywords: GWAS; genome-wide association study; Arabidopsis thaliana; plant phenomics; plant genomics; PLINK; TASSEL; GAPIT; FaST-LMM; statistical power; input data quantity; epistasis.
DOI: 10.1504/IJDMB.2019.099286
International Journal of Data Mining and Bioinformatics, 2019 Vol.22 No.1, pp.19 - 43
Received: 12 Jan 2019
Accepted: 28 Jan 2019
Published online: 24 Apr 2019 *