Title: Using multivariate methods to infer knowledge from genomic data
Authors: Liliana López-Kleine; Nicolás Molano; Luis Ospina
Addresses: Statistics Department, Universidad Nacional de Colombia, Colombia ' Statistics Department, Universidad Nacional de Colombia, Colombia ' Statistics Department, Universidad Nacional de Colombia, Colombia
Abstract: Since the introduction of genome sequencing techniques several methods for genomic data preprocessing and analysis have been published and applied to answer different biological questions. Rarely, multivariate methods have been used to extract knowledge about protein roles. Two of the most informative types of data are gene expression data (microarrays) and phylogenetic profiles indicating presence of genes in other organisms and therefore providing information about their co-evolution. Here we show that these two types of data, analyzed by means of principal component analysis and non parametric discriminant analysis, provide useful information about protein function and their participation in virulence processes.
Keywords: statistical genomics; microarray data; phylogenetic profiles; multivariate statistical analysis; protein function; virulence factors; bioinformatics; gene expression data; principal component analysis; PCA; nonparametric discriminant analysis.
DOI: 10.1504/IJBRA.2013.053607
International Journal of Bioinformatics Research and Applications, 2013 Vol.9 No.3, pp.285 - 300
Received: 02 Mar 2011
Accepted: 31 Aug 2011
Published online: 06 Sep 2014 *