Title: Biomarker identification from gene expression: an effective computational pipeline

Authors: Emon Asad; Ayatullah Faruk Mollah

Addresses: Department of Computer Science and Engineering, Aliah University, IIA/27 New Town, Kolkata, 700160, India ' Department of Computer Science and Engineering, Aliah University, IIA/27 New Town, Kolkata, 700160, India

Abstract: Discovering biomarkers from microarray data is an extremely important research subject, as biomarkers help to diagnose disease types, find therapeutic plans for a disease, and contain crucial biological information about organisms. In this paper, a machine learning-based two-stage biomarker identification technique for microarray datasets is presented. In the first stage, analysis of variance F-scores are applied to identify candidate biomarkers as top quartile, whereas in the second stage, performance of the possible biomarkers is examined with an ensemble classifier and the responsible biomarker(s) are identified based on their ability to characterise corresponding genetic disease(s). Interestingly, this method yields 100% classification accuracy with only one biomarker for each of the six different types of publicly available microarray datasets considered in this work, which is undoubtedly superior to many state-of-the-art methods. The selected biomarkers are also found biologically relevant and meaningful in terms of gene ontology, DisGeNET and various biochemical pathway terms.

Keywords: biomarker identification; genetic diseases; microarray gene expression; feature selection; analysis of variance; ANOVA.

DOI: 10.1504/IJBRA.2024.138715

International Journal of Bioinformatics Research and Applications, 2024 Vol.20 No.2, pp.181 - 203

Received: 30 Jun 2023
Accepted: 09 Oct 2023

Published online: 29 May 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article