Title: Pre-processing of microarray gene expression data for classification using adaptive feature selection and imputation of non-ignorable missing values
Authors: R. Devi Priya; R. Sivaraj
Addresses: Department of Information Technology, Kongu Engineering College, Erode, Tamil Nadu, India ' Department of Computer Science and Engineering, Velalar College of Engineering and Technology, Erode, Tamil Nadu, India
Abstract: Microarray datasets often contain many features and incomplete values. To address these issues, this paper introduces a method called Genetic Algorithm-Based Adaptive Feature Selection with Missing value Imputation (GAFSMI) with two contributions. First, for identifying the noteworthy features, Genetic Algorithm-Based Adaptive Feature Selection (GAFS) is proposed. Then for imputing the non-ignorable missing values, Bayesian Genetic Algorithm (BAGEL) integrating genetic algorithm with Bayesian principles is introduced. These two pre-processing steps generate the complete dataset with optimal feature subset to perform classification with better accuracy. The proposed algorithm is implemented on eight microarray datasets and it is observed that GAFS selects optimal feature subset with appreciable classification accuracy than other feature selection techniques. The imputation accuracy of BAGEL measured is found to be better than other standard imputation techniques at different missing rates (5% to 40%). Classification accuracy is improved in all the datasets processed with GAFS and BAGEL.
Keywords: microarray datasets; feature selection; missing values; genetic algorithms; classification accuracy; pre-processing; gene expression data; bioinformatics; imputation accuracy.
DOI: 10.1504/IJDMB.2016.080670
International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.3, pp.183 - 204
Received: 14 Nov 2015
Accepted: 11 Sep 2016
Published online: 01 Dec 2016 *