Title: A new entropy-based heuristic for automatic enrichment of ontologies
Authors: Rabah Mazouzi; Hien Tran; Patrice Darmon
Addresses: Department of Research and Innovation, CGI Inc., Puteaux, 92800, France ' Department of Research and Innovation, CGI Inc., Puteaux, 92800, France ' Department of Research and Innovation, CGI Inc., Puteaux, 92800, France
Abstract: In this paper, we propose a new method for automatic ontology enrichment. The content of the considered ontology in this work is a set of computer science skills, used to score and rank profiles according to their curriculum vitae (CV). Starting from an initial ontology, defined manually by experts, named entities are extracted from CVs, then, by using the TF-IDF method we determine which words are more likely representative of skills of a given area of expertise. Obtained words are then inserted in the corresponding area of expertise branch of the ontology. The enrichment is controlled by a heuristic based on an entropy-like quantity that measures the profiles ranking disorder, caused by the new word insertion, assuming that such disorder is correlated with the amount of noise in the ontology. The obtained experimental results confirmed that the entropy-like quantity can be used to control the automatic enrichment of the ontology.
Keywords: ontology enrichment; text mining; entropy; natural language processing; NLP; fuzzy ontology; documents matching.
DOI: 10.1504/IJITCC.2023.132846
International Journal of Information Technology, Communications and Convergence, 2023 Vol.4 No.2, pp.153 - 166
Accepted: 06 Feb 2023
Published online: 11 Aug 2023 *