Title: Prediction of ncRNA from RNA-Seq data using machine learning techniques

Authors: Faroza Shamsheem; Tunga Arundhathi; Khaleda Afroaz

Addresses: Department of CS & IT, Maulana Azad National Urdu University, Gachibowli, Hyderabad, Telangana, 500032, India ' Department of CS & IT, Maulana Azad National Urdu University, Gachibowli, Hyderabad, Telangana, 500032, India ' Department of CS & IT, Maulana Azad National Urdu University, Gachibowli, Hyderabad, Telangana, 500032, India

Abstract: Non-coding RNAs (ncRNAs) are currently receiving more attention in bioinformatics and biology as a result of the rapidly increasing significance of biological research. In biological processes like transcription and translation, they are crucial. We must classify ncRNAs in order to better understand the causes of illness and develop effective treatments. It is preferable to classify non-coding RNA transcripts into several groups in addition to differentiating between coding and non-coding transcripts. There are several approaches available for this task, but their classification performance is still a major problem. In this study, we initially developed machine learning techniques to separate coding transcripts from non-coding transcripts, and then we classified ncRNAs into corresponding classifications. On the human dataset, we have assessed the effectiveness of four machine learning methods, namely: logistic regression, random forest, XGBoost, and decision tree. Among these four algorithms, the maximum accuracy level is gained by random forest, with nearly 83%.

Keywords: ncRNA; lncRNA; prediction; machine learning.

DOI: 10.1504/IJBRA.2023.132630

International Journal of Bioinformatics Research and Applications, 2023 Vol.19 No.2, pp.116 - 124

Received: 25 Aug 2022
Accepted: 04 Jan 2023

Published online: 31 Jul 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article