Title: Domain-specific schema discovery from general-purpose knowledge base

Authors: Everaldo Costa Neto; Johny Moreira; Luciano Barbosa; Ana Carolina Salgado

Addresses: Centro de Informatica, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil ' Centro de Informatica, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil ' Centro de Informatica, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil ' Centro de Informatica, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil

Abstract: General-purpose Knowledge Bases (KBs) have been used for various applications. An essential step for leveraging the content of KBs on domain-specific tasks is to discover their schema. In this paper, we propose ANCHOR, an end-to-end pipeline for schema discovery from general-purpose KB in an automated way. ANCHOR identifies a domain of interest based on category mapping from KB. Next, it learns representations of entities in this domain based on the entity-category mappings and uses these representations to identify the entities' topics within this domain. Finally, ANCHOR generates a profile for each topic using a strategy based on attributes co-occurrence. We have evaluated ANCHOR on four domains. The results show that: (1) the learned entity representation effectively produces better entity clusters than some traditional and embedding-based baselines; (2) our solution produces a high-quality profile for the discovered topics.

Keywords: schema discovery; knowledge base; topic identification; entity representation.

DOI: 10.1504/IJMSO.2023.137159

International Journal of Metadata, Semantics and Ontologies, 2023 Vol.16 No.3, pp.210 - 226

Received: 09 Nov 2022
Received in revised form: 15 Apr 2023
Accepted: 21 Aug 2023

Published online: 04 Mar 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article