Title: Semantic similarity measurement: an intrinsic information content model
Authors: Abhijit Adhikari; Biswanath Dutta; Animesh Dutta; Deepjyoti Mondal
Addresses: School of Computer Science and Engineering, Vellore Institute of Technology-AP, Amaravati 522237, Andhra Pradesh, India ' Documentation Research and Training Centre, Indian Statistical Institute, Bangalore 560059, Karnataka, India ' Department of Computer Science and Engineering, National Institute of Technology, Durgapur 713209, West Bengal, India ' Department of Software Engineering, Media.Net (Directi), Maharashtra 400069, Mumbai, India
Abstract: Ontology dependent Semantic Similarity (SS) measurement has emerged as a new research paradigm in finding the semantic strength between any two entities. In this regard, as observed, the information theoretic intrinsic approach yields better accuracy in correlation with human cognition. The precision of such a technique highly depends on how accurately we calculate Information Content (IC) of concepts and its compatibility with a SS model. In this work, we develop an intrinsic IC model to facilitate better SS measurement. The proposed model has been evaluated using three vocabularies, namely SNOMED CT, MeSH and WordNet against a set of benchmark data sets. We compare the results with the state-of-the-art IC models. The results show that the proposed intrinsic IC model yields a high correlation with human assessment. The article also evaluates the compatibility of the proposed IC model and the other existing IC models in combination with a set of state-of-the-art SS models.
Keywords: semantic similarity; knowledge-based systems; ontology; intrinsic information content; natural language processing.
DOI: 10.1504/IJMSO.2020.112803
International Journal of Metadata, Semantics and Ontologies, 2020 Vol.14 No.3, pp.218 - 233
Received: 25 May 2020
Accepted: 05 Oct 2020
Published online: 03 Feb 2021 *