Title: An effective and time-efficient approach for Linked Data fusion using genetic algorithms
Authors: Khayra Bencherif; Mimoun Malki
Addresses: EEDIS Laboratory, Djilali Liabes University, Sidi Bel Abbes, Algeria ' LabRI Laboratory, High School of Computer Science ESI, Sidi Bel Abbes, Algeria
Abstract: The Linked Open Data Cloud is a project that uses RDF formalism to publish data in the form of a triple on the web under open licence. With the ever increasing amount of data sets available in the LOD Cloud, it is already beyond the human capability to integrate heterogeneous data manually. So far, the task of Linked Data fusion entails a significant amount of time owing to the large number of instances in the data sets from the LOD Cloud. In this paper, we suggest a new system to efficiently combine heterogeneous data from the LOD Cloud. First, we extract similar instances from the LOD Cloud to identify identical or related information. Then, our system collects all predicates and objects of the similar instances to construct a set of trees. Finally, we propose a genetic algorithm to merge data in the constructed trees. In the following, we give an overview of our system architecture and we detail our genetic algorithm. We also evaluate our system using real data sets showing that it can increase the completeness and the conciseness in data fusion. Moreover, we prove that our system is faster when fusing large data sets from the LOD Cloud.
Keywords: linked data; data integration; data fusion; genetic algorithms; Linked Open Data; LOD Cloud.
DOI: 10.1504/IJMSO.2016.080349
International Journal of Metadata, Semantics and Ontologies, 2016 Vol.11 No.2, pp.110 - 123
Received: 28 Apr 2016
Accepted: 02 Aug 2016
Published online: 16 Nov 2016 *