Title: Named entity recognition and classification in biomedical text using classifier ensemble
Authors: Sriparna Saha; Asif Ekbal; Utpal Kumar Sikdar
Addresses: Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, India ' Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, India ' Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, India
Abstract: Named Entity Recognition and Classification (NERC) is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm (GA) for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F-measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.
Keywords: biomedical information retrieval; named entity recognition; named entity classification; biomedical texts; single objective optimisation; genetic algorithms; classifier ensemble; data mining; bioinformatics; conditional random field; support vector machines; SVM modelling.
DOI: 10.1504/IJDMB.2015.067954
International Journal of Data Mining and Bioinformatics, 2015 Vol.11 No.4, pp.365 - 391
Received: 20 Aug 2012
Accepted: 21 Feb 2013
Published online: 12 Mar 2015 *