Inclusion of Wikipedia, a language specific knowledge resource to generate and update a synset in WordNet Online publication date: Tue, 10-Dec-2019
by Sunny Rai; Amita Jain; Priyank Pandey
International Journal of Technology, Policy and Management (IJTPM), Vol. 19, No. 4, 2019
Abstract: Lack of competent lexical resources is a ubiquitous fact that negatively affects the development of natural language processing tools for not so widely spoken languages. Recently, projects such as Indo WordNet have significantly reduced the scarcity of lexicons for Indian languages. However, their coverage is still a matter of concern. The cost and time incurred are other limiting factors. The reluctance to automate the process of lexicon generation is majorly credited to the poor precision of the generated synsets. In this paper, we strive to tackle these issues by incorporating language-specific knowledge resources which ensures the authenticity of the generated synsets along with the inclusion of endemic words. We propose a corpus-based approach for automated synset generation which visibly improves the quality of the generated synsets. The experiments performed on a manually created dataset of Hindi words provide a precision of 81.56% and an F-measure of more than 72%.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Technology, Policy and Management (IJTPM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com