Title: Global schema as local data integrator using active learning to identify candidates attributes
Authors: Clóvis Santos; Carina Dorneles
Addresses: Institute of Exact and Natural Sciences, Federal University of Rondonópolis - UFR, Rondonópolis, MT, Brazil ' Department of Informatics and Statistics, Federal University of Santa Catarina - UFSC, Florianópolis, SC, Brazil
Abstract: Data integration represents a challenge in application development. Although there are several alternatives to data integration, such as federated and distributed databases, there are still problems with the standardisation of distinct data sources, and this happens because different companies develop distinct systems with different paradigms and concepts. In this paper, we present a case study, in the agriculture and environment domain, of an essential point in the data integration domain which is to show resources to identify nearby attributes concerning the characteristics of the content foreseen in the requirements presented in the proposed schema. Information technology experts in agribusiness help map the most relevant attributes for the investigated scenario. In our experimental tests, we used a quantitative method data analysis approach to validate the results with quantitative comparisons regarding the percentages of proximity between the attribute contents in the databases. Our proposal presents an alternative to simplify data integration without intermediate application or middleware layers. The results were measured on a scale between 0% and 100% to identify candidate attributes. The results were good in identifying attributes in the databases in almost 67% of the cases.
Keywords: agribusiness; database; text mining; data extraction; machine learning.
DOI: 10.1504/IJAMS.2023.134427
International Journal of Applied Management Science, 2023 Vol.15 No.4, pp.296 - 310
Received: 20 May 2022
Accepted: 13 Nov 2022
Published online: 23 Oct 2023 *