Title: Biomarker identification from gene expression: an effective computational pipeline
Authors: Emon Asad; Ayatullah Faruk Mollah
Addresses: Department of Computer Science and Engineering, Aliah University, IIA/27 New Town, Kolkata, 700160, India ' Department of Computer Science and Engineering, Aliah University, IIA/27 New Town, Kolkata, 700160, India
Abstract: Discovering biomarkers from microarray data is an extremely important research subject, as biomarkers help to diagnose disease types, find therapeutic plans for a disease, and contain crucial biological information about organisms. In this paper, a machine learning-based two-stage biomarker identification technique for microarray datasets is presented. In the first stage, analysis of variance F-scores are applied to identify candidate biomarkers as top quartile, whereas in the second stage, performance of the possible biomarkers is examined with an ensemble classifier and the responsible biomarker(s) are identified based on their ability to characterise corresponding genetic disease(s). Interestingly, this method yields 100% classification accuracy with only one biomarker for each of the six different types of publicly available microarray datasets considered in this work, which is undoubtedly superior to many state-of-the-art methods. The selected biomarkers are also found biologically relevant and meaningful in terms of gene ontology, DisGeNET and various biochemical pathway terms.
Keywords: biomarker identification; genetic diseases; microarray gene expression; feature selection; analysis of variance; ANOVA.
DOI: 10.1504/IJBRA.2024.138715
International Journal of Bioinformatics Research and Applications, 2024 Vol.20 No.2, pp.181 - 203
Received: 30 Jun 2023
Accepted: 09 Oct 2023
Published online: 29 May 2024 *