Title: Enrichment of data in digital documents with metadata extraction
Authors: Clovis Dos Santos Júnior; Carina Friedrich Dorneles
Addresses: Institute of Exact and Natural Sciences, Federal University of Rondonópolis (UFR), Rondonópolis, Mato Grosso, Brazil ' Department of Informatics and Statistics, Federal University of Santa Catarina (UFSC), Florianópolis, Santa Catarina, Brazil
Abstract: Companies have migrated their operational activities from paper documents to automated processes with fully digital storage. This management trend is positive, but printed documents, in most cases, cannot be discarded for administrative or legal reasons. This research used data extraction to enrich the database of a Non-Governmental Organisation (NGO) that monitors the use of public financial resources in counties. The implementation analysed the digital files containing official documents and identified the words with the highest occurrence according to algorithms presented in the research results. The solution created in the research added metadata to improve the search for documents in the database and improve the procedural follow-up of administrative and judicial actions. The results were positive with success in the extraction of the keywords in each document and presented with examples in the results section, showing the steps used to add metadata in the documents.
Keywords: electronic document; text mining; data extraction; NGO.
DOI: 10.1504/IJMSO.2023.135335
International Journal of Metadata, Semantics and Ontologies, 2023 Vol.16 No.2, pp.187 - 193
Received: 30 Jul 2022
Received in revised form: 17 Mar 2023
Accepted: 22 Mar 2023
Published online: 05 Dec 2023 *