Title: Extending meta-learning framework for clustering gene expression data with component-based algorithm design and internal evaluation measures
Authors: Milan Vukicevic; Sandro Radovanovic; Boris Delibasic; Milija Suknovic
Addresses: Faculty of Organizational Sciences, University of Belgrade, Jove Ilica 154, Belgrade, Serbia ' Faculty of Organizational Sciences, University of Belgrade, Jove Ilica 154, Belgrade, Serbia ' Faculty of Organizational Sciences, University of Belgrade, Jove Ilica 154, Belgrade, Serbia ' Faculty of Organizational Sciences, University of Belgrade, Jove Ilica 154, Belgrade, Serbia
Abstract: Class retrieval in gene expression microarray data analysis is highly challenging task. Because of high class imbalance, highly dimensional feature space and small number of samples most of the algorithms fail to capture real complex structures in data ('golden standard'). Therefore, one of the major problems in this area is selection of the best suited algorithm for data at hand. We address this problem by proposing an extended meta-learning framework for ranking and selection of algorithms for clustering gene expression microarray data. Proposed framework introduces several improvements compared to the original one: extended algorithm space, extended meta-feature space and introduction of cutting edge techniques for meta-feature selection and parameter optimisation of meta-algorithms. System was evaluated on large algorithm and problem space (504 algorithms and 30 datasets) and showed very promising results in prediction of algorithm performance for specific problems.
Keywords: meta-learning; clustering algorithms; gene expression; regression; bioinformatics; class retrieval; algorithm ranking; algorithm selection; internal evaluation; meta-feature selection; parameter optimisation; feature selection.
DOI: 10.1504/IJDMB.2016.074682
International Journal of Data Mining and Bioinformatics, 2016 Vol.14 No.2, pp.101 - 119
Received: 11 Sep 2014
Accepted: 25 Mar 2015
Published online: 13 Feb 2016 *