Title: Prediction of essential genes using single nucleotide compositional features in genomes of bacteria: a machine learning-based analysis
Authors: Annushree Kurmi; Piyali Sen; Madhusmita Dash; Aswini Kumar Patra; Suvendra Kumar Ray; Siddhartha Sankar Satapathy
Addresses: Department of Computer Science and Engineering, Tezpur University, Napaam-784028, Assam, India ' Department of Computer Science and Engineering, Tezpur University, Napaam-784028, Assam, India ' Department of Electronics and Communication Engineering, NIT, Jote-791113, Arunachal Pradesh, India ' Department of Computer Science and Engineering, North Eastern Regional Institute of Science and Technology (NERIST), Nirjuli (Itanagar)-791109, Arunachal Pradesh, India ' Department of Molecular Biology and Biotechnology, Tezpur University, Napaam-784028, Assam, India ' Department of Computer Science and Engineering, Tezpur University, Napaam-784028, Assam, India
Abstract: Essential genes are crucial for understanding the cellular processes of an organism. In this article, we have done an extensive machine learning-based analysis of single nucleotide composition in 35 bacterial genomes across several phylogenetic groups. With an objective of classifying essential genes from the remaining genes, we have used seven machine learning-based classifiers - logistic regression, Gaussian Naïve Bayes, k-nearest neighbours, decision tree, random forest, extreme gradient boosting and support vector machine. Random forest classifier was a better performer among the seven classifiers and achieved an AUC score of at least 70% for thirteen organisms. Higher AUC scores were achieved for several organisms such as Salmonella enterica, Sphingomonas wittichii, Bacillus thuringiensis, and Streptococcus pyogenes. Prediction result obtained in general from the machine learning-based analysis suggests that the single nucleotide compositional features may be useful in predicting gene essentiality in some bacteria species though not universally.
Keywords: essential genes; single nucleotide composition; bacterial genome; machine learning.
DOI: 10.1504/IJBRA.2023.131276
International Journal of Bioinformatics Research and Applications, 2023 Vol.19 No.1, pp.1 - 18
Received: 30 Mar 2022
Accepted: 04 Jan 2023
Published online: 05 Jun 2023 *