Handling imbalanced resources and loanwords in Vietnamese-Bahnaric neural machine translation Online publication date: Tue, 01-Oct-2024
by Long-Ngo-Hoang Bui; Huu-Thien-Phu Nguyen; Minh-Khoi Le; Cong-Thien Pham; Thanh-Tho Quan
International Journal of Intelligent Information and Database Systems (IJIIDS), Vol. 16, No. 4, 2024
Abstract: Machine translation is a crucial application. Recent deep learning (DL) architectures support the neural machine translation (NMT) to achieve significant milestones, bridge the gap between human and machines translation. However, the NMT still faces challenges when involved with extremely low-resource languages of ethnic groups, e.g., the Bahnaric in Vietnam. The challenges come from the imbalance of language resources compare to the target languages, which also causes the loanwords to occur frequently in the target language. In this paper, we propose a novel solution of handling the scarcity problem of the NMT. Inspired from the work of incorporation of contextual embedding from pre-trained language models in BERT-fused NMT. We combine both solutions to formed one model that effectively handle imbalanced resources and loanwords scenarios. Experimental results show effectiveness on the Vietnamese-Bahnaric pair by outperforming the state-of-the-art BERT-fused NMT in more than five BLEU scores.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Intelligent Information and Database Systems (IJIIDS):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com