Title: Mining gene-centric relationships from literature: the roles of gene mutation and gene expression in supporting drug discovery
Authors: Luis Tari; Jagruti Patel; Jan Küntzer; Ying Li; Zhengwei Peng; Yuan Wang; Laura Aguiar; James Cai
Addresses: Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, Roche Diagnostics GmbH, 82377 Penzberg, Bavaria, Germany ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, F. Hoffmann-La Roche AG, 4070 Basel, Switzerland ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA
Abstract: Identifying drug target candidates is an important task for early development throughout the drug discovery process. This process is supported by the development of new high-throughput technologies that enable better understanding of disease mechanism. It becomes critical to facilitate effective analysis of the large amount of biological data. However, with much of the biological knowledge represented in the literature in the form of natural text, analysis and interpretation of high-throughput data has not reached its potential effectiveness. In this paper, we describe our solution in employing text mining as a technique in finding scientific information for target and biomarker discovery from the biomedical literature. Our approach utilises natural language processing techniques to capture linguistic patterns for the extraction of biological knowledge from text. Additionally, we discuss how the extracted knowledge is used for the analysis of biological data such as next-generation sequencing and gene expression data.
Keywords: literature text mining; drug discovery; gene mutations; phenotypes; gene expression; drug targets; biomarkers; information extraction; natural language processing; NLP; knowledge extraction; biological knowledge; bioinformatics.
DOI: 10.1504/IJDMB.2014.064888
International Journal of Data Mining and Bioinformatics, 2014 Vol.10 No.4, pp.357 - 373
Received: 03 May 2012
Accepted: 04 May 2012
Published online: 21 Oct 2014 *