Title: Usage of ensemble model and genetic algorithm in pipeline for feature selection from cancer microarray data
Authors: Sahu Barnali; Dehuri Satchidananda; Jagadev Alok Kumar
Addresses: Department of Computer Science and Engineering, Siksha 'O' Anusandhan (deemed to be) University, Bhubaneswar, 751030, Odisha, India ' P.G. Department of Information and Communication Technology, Fakir Mohan University, Vyasa Vihar, Balasore, 756019, Odisha, India ' School of Computer Engineering, Kalinga Institute of Industrial Technology (deemed to be) University, Bhubaneswar, 751024, Odisha, India
Abstract: This paper proposes an ensemble of feature selection techniques with genetic algorithm (GA) in pipeline for selecting features from microarray data. The ensemble is a combination of filter and wrapper-based feature selection methods. In addition, GA in pipeline has been used for refinement of ensemble output to produce a non-local set of robust feature subset. An extensive computational experiment has been carried out on a prostate cancer dataset for validation of the method and comparison with group genetic algorithm (GGA). Finally, the resultant feature subsets of GA, GGA, and other constituents of the ensemble in standalone mode have been used for uncovering frequent patterns based on Apriori and FP-growth. The experimental study confirms that the proposed method gives classification accuracy of 100%, 98.34%, 98.02%, and 97% based on an ensemble of classifiers w. r. t. 5, 10, 15, and 20 features, respectively, vis-à-vis 92.34%, 90.34%, 86.54%, and 87.21% of GGA.
Keywords: microarray data; differentially expressed genes; ensemble feature selection; Apriori; FP-growth.
DOI: 10.1504/IJBRA.2020.109100
International Journal of Bioinformatics Research and Applications, 2020 Vol.16 No.3, pp.217 - 244
Received: 16 Mar 2017
Accepted: 22 Jan 2018
Published online: 20 Aug 2020 *