Title: Disambiguation of semantic types in complex noun phrases for extracting candidate terms
Authors: Imene Bentounsi; Zizette Boufaïda
Addresses: Department of Software Technologies and Information Systems, LIRE Laboratory, University of Constantine 2-Abdelhamid Mehri, Constantine 25000, Algeria ' Department of Software Technologies and Information Systems, LIRE Laboratory, University of Constantine 2-Abdelhamid Mehri, Constantine 25000, Algeria
Abstract: Mapping concepts from medical resources to structured medical documents is a prerequisite for many automatic document processing tasks. These resources are characterised by an abundance of material to represent any given concept. Moreover, the resources may include ambiguous terms in unstructured form that lead to distorted results in automating biomedical text mining. This paper is an exploratory study on disambiguation of semantic types for extracting a structured taxonomy from unstructured reports. Specifically, the terms that will be disambiguated are terms that have more than one semantic type in the Unified Medical Language System (UMLS) Metathesaurus. We suggest a word sense disambiguation algorithm that utilises the UMLS is-a hierarchy, augmented with a higher level representing semantic groups, as a knowledge base. The purpose is to explore all possible commonalities to classify simple or composed candidate terms with the Nearest Common Kinship (NCK). Experiments with the training corpora provide encouraging results.
Keywords: metadata; noise reduction; word sense disambiguation; semantics; semantic types; term extraction; automatic document processing; XML; medical ontology; ontologies; complex noun phrases; medical resources; medical documents; biomedical text mining; structured taxonomy; unstructured reports; medical reports; UMLS; nearest common kinship; NCK.
DOI: 10.1504/IJMSO.2015.070830
International Journal of Metadata, Semantics and Ontologies, 2015 Vol.10 No.2, pp.112 - 122
Received: 25 Jun 2014
Accepted: 27 Apr 2015
Published online: 28 Jul 2015 *