Title: Prediction of ncRNA from RNA-Seq data using machine learning techniques
Authors: Faroza Shamsheem; Tunga Arundhathi; Khaleda Afroaz
Addresses: Department of CS & IT, Maulana Azad National Urdu University, Gachibowli, Hyderabad, Telangana, 500032, India ' Department of CS & IT, Maulana Azad National Urdu University, Gachibowli, Hyderabad, Telangana, 500032, India ' Department of CS & IT, Maulana Azad National Urdu University, Gachibowli, Hyderabad, Telangana, 500032, India
Abstract: Non-coding RNAs (ncRNAs) are currently receiving more attention in bioinformatics and biology as a result of the rapidly increasing significance of biological research. In biological processes like transcription and translation, they are crucial. We must classify ncRNAs in order to better understand the causes of illness and develop effective treatments. It is preferable to classify non-coding RNA transcripts into several groups in addition to differentiating between coding and non-coding transcripts. There are several approaches available for this task, but their classification performance is still a major problem. In this study, we initially developed machine learning techniques to separate coding transcripts from non-coding transcripts, and then we classified ncRNAs into corresponding classifications. On the human dataset, we have assessed the effectiveness of four machine learning methods, namely: logistic regression, random forest, XGBoost, and decision tree. Among these four algorithms, the maximum accuracy level is gained by random forest, with nearly 83%.
Keywords: ncRNA; lncRNA; prediction; machine learning.
DOI: 10.1504/IJBRA.2023.132630
International Journal of Bioinformatics Research and Applications, 2023 Vol.19 No.2, pp.116 - 124
Received: 25 Aug 2022
Accepted: 04 Jan 2023
Published online: 31 Jul 2023 *