Title: Automatic extraction of reference gene from literature in plants based on texting mining
Authors: He Lin; Shen Gengyu; Li Fei; Huang Shuiqing
Addresses: Department of Information Management, Nanjing Agricultural University, XuanWu District, Nanjing 210095, China ' Library of Nanjing Agricultural University, XuanWu District, Nanjing 210095, China ' Department of Entomology, Nanjing Agricultural University, XuanWu District, Nanjing 210095, China ' Department of Information Management, Nanjing Agricultural University, XuanWu District, Nanjing 210095, China
Abstract: Real-Time Quantitative Polymerase Chain Reaction (qRT-PCR) is widely used in biological research. It is a key to the availability of qRT-PCR experiment to select a stable reference gene. However, selecting an appropriate reference gene usually requires strict biological experiment for verification with high cost in the process of selection. Scientific literatures have accumulated a lot of achievements on the selection of reference gene. Therefore, mining reference genes under specific experiment environments from literatures can provide quite reliable reference genes for similar qRT-PCR experiments with the advantages of reliability, economic and efficiency. An auxiliary reference gene discovery method from literature is proposed in this paper which integrated machine learning, natural language processing and text mining approaches. The validity tests showed that this new method has a better precision and recall on the extraction of reference genes and their environments.
Keywords: biological knowledge discovery; machine learning; NLP; natural language processing; reference genes; text mining; real-time quantitative PCR; polymerase chain reaction; bioinformatics; gene extraction; gene discovery.
DOI: 10.1504/IJDMB.2015.070063
International Journal of Data Mining and Bioinformatics, 2015 Vol.12 No.4, pp.400 - 416
Received: 22 Nov 2013
Accepted: 24 Jun 2014
Published online: 26 Jun 2015 *