Title: Adaptable address parser with active learning
Authors: You-Xuan Lin
Addresses: National Center for Research on Earthquake Engineering, No. 200, Sec. 3, Xinhai Rd., Da'an Dist., Taipei City 106219, Taiwan
Abstract: Address parsing, decomposing address strings to semantically meaningful components, is a measure to convert unstructured or semi-structured address data to structured one. Flexibility and variability in real-world address formats make parser development a non-trivial task. Even after all the time and effort dedicated to obtaining a capable parser, updating or even re-training is required for out-of-domain data and extra costs will be incurred. To minimise the cost of model building and updating, this study experiments with active learning for model training and adaptation. Models composed of character-level embedding and recurrent neural networks are trained to parse address in Taiwan. Results show that by active learning, 420 additional instances to the training data are sufficient for a model to adapt itself to unfamiliar data while its competence in the original domain is retained. This suggests that active learning is helpful for model adaptation when data labelling is expensive and restricted.
Keywords: address parsing; record linkage; active learning; model adaptation; recurrent neural network; RNN; address in Taiwan.
DOI: 10.1504/IJDMMM.2023.129991
International Journal of Data Mining, Modelling and Management, 2023 Vol.15 No.1, pp.79 - 101
Received: 18 Nov 2021
Accepted: 28 Jan 2022
Published online: 04 Apr 2023 *