Chinese and Vietnamese cross-lingual topic discovery based on word similarity of comparable corpus Online publication date: Mon, 08-Jul-2024
by Zhengtao Yu; Linjie Xia; Peili Tang; Xiaocong Wang; Shengxiang Gao
International Journal of Information and Communication Technology (IJICT), Vol. 25, No. 1, 2024
Abstract: In order to solve the problem of the scarcity of Chinese-Vietnamese comparable corpus and limited scale of bilingual dictionaries, we propose a method for cross-language topic discovery based on the similarity between Chinese and Vietnamese. Firstly, we use the Chinese-Vietnamese comparable corpus to train to get the word vectors representing the bilingual texts, and calculate the similarity between the Chinese query words and Vietnamese words. Then, we select out readily extended words from Vietnamese which are similar to the Chinese words. Subsequently, the Chinese-Vietnamese translation model is constructed from the similarity between the Chinese-Vietnamese words, to search the Vietnamese word from the translation model and return the related Vietnamese document. Finally, the AP algorithm is used to obtain the Vietnamese documents related to the Chinese text. The experimental results show that the proposed method has achieved good results in accuracy and recall rate.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Information and Communication Technology (IJICT):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com