Title: Identification of true EST alignments for recognising transcribed regions
Authors: Chuang Ma; Jia Wang; Lun Li; Mo-Jie Duan; Yan-Hong Zhou
Addresses: Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan 430074, China. ' Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan 430074, China. ' Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan 430074, China. ' Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan 430074, China. ' Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan 430074, China
Abstract: Transcribed regions can be determined by aligning Expressed Sequence Tags (ESTs) with genome sequences. The kernel of this strategy is to effectively distinguish true EST alignments from spurious ones. In this study, three measures including Direction Check, Identity Check and Terminal Check were introduced to more effectively eliminate spurious EST alignments. On the basis of these introduced measures and other widely used measures, a computational tool, named ESTCleanser, has been developed to identify true EST alignments for obtaining reliable transcribed regions. The performance of ESTCleanser has been evaluated on the well-annotated human ENCyclopedia of DNA Elements (ENCODE) regions using human ESTs in the dbEST database. The evaluation results show that the accuracy of ESTCleanser at exon and intron levels is more remarkably enhanced than that of UCSC-spliced EST alignments. This work would be helpful to EST-based researches on finding new genes, complementing genome annotation, recognising alternative splicing events and Single Nucleotide Polymorphisms (SNPs), etc.
Keywords: expressed sequence tag; EST alignment; genome sequence; protein coding genes; transcribed regions; measure; filtering criteria; genome annotation; alternative splicing; computational genomics; data mining; bioinformatics.
DOI: 10.1504/IJDMB.2011.043029
International Journal of Data Mining and Bioinformatics, 2011 Vol.5 No.5, pp.465 - 484
Received: 12 Aug 2009
Accepted: 10 Dec 2009
Published online: 24 Jan 2015 *