Title: LIFT: lncRNA identification and function-prediction tool
Authors: Sumukh Deshpande; James Shuttleworth; Jianhua Yang; Sandy Taramonli; Matthew England
Addresses: Central Biotechnology Services (CBS), College of Biomedical and Life Sciences, Cardiff University, Sir Geraint Evans Building (Room 1/14), Heath Park, Cardiff, CF14 4XN, UK ' School of Computing, Electronics and Mathematics, Coventry University, Coventry, CV1 2JH, UK ' Department of Computer Science, University of Warwick, 6, Lord Bhattacharyya Way, Coventry, CV4 7EZ, UK ' Faculty of Engineering, Environment and Computing, School of Computing, Electronics and Mathematics, Coventry University, Coventry, CV1 2JH, UK ' Faculty of Engineering, Environment and Computing, School of Computing, Electronics and Mathematics, Coventry University, Coventry, CV1 2JH, UK
Abstract: Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. Accurate identification and sub-classification of lncRNAs is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify, classify and predict their biological functions in plant species. In this study, a novel computational framework called LncRNA identification and function prediction tool (LIFT) has been developed, which implements least absolute shrinkage and selection operator (LASSO) optimisation and iterative random forests classification for selection of optimal features, a novel position-based classification (PBC) method for sub-classifying lncRNAs into different classes, and a Bayesian-based function prediction approach for annotating lncRNA transcripts. Using LASSO, LIFT selected 31 optimal features and achieved a 15-30% improvement in the prediction accuracy on plant species when evaluated against state-of-the-art CPC tools. Using PBC, LIFT successfully identified the intergenic and antisense transcripts with greater accuracy in the A. thaliana and Z. mays datasets.
Keywords: lncRNA; long non-coding RNAs; LASSO; least absolute shrinkage and selection operator; iterative random forests; PBC; position-based classification; BMRF; Bayesian Markov random fields; function prediction.
DOI: 10.1504/IJBRA.2021.120535
International Journal of Bioinformatics Research and Applications, 2021 Vol.17 No.6, pp.512 - 536
Received: 20 Sep 2018
Accepted: 02 Oct 2019
Published online: 25 Jan 2022 *