Title: Contributions to the automatic processing of the user-generated Tunisian dialect on the social web
Authors: Jihene Younes; Hadhemi Achour; Emna Souissi; Ahmed Ferchichi
Addresses: ISGT, Université de Tunis, LR99ES04 BESTMOD, 2000, Le Bardo, Tunisia ' ISGT, Université de Tunis, LR99ES04 BESTMOD, 2000, Le Bardo, Tunisia ' ENSIT, Université de Tunis, 1008, Montfleury, Tunisia ' ISGT, Université de Tunis, LR99ES04 BESTMOD, 2000, Le Bardo, Tunisia
Abstract: With the growing use of social media in the Arab world, Arabic dialects are rapidly spreading on the web, leading to a growing interest from NLP researchers. These dialects are however, still under-resourced languages which is a major obstacle to their study and processing. In this paper, we focus on the automatic processing of the user-generated Tunisian dialect (TD) on the social web and propose an approach that aids to automatically generate TD language resources. This approach exploits the large amounts of textual productions on the social web to extract and generate dialectal content. It is based on two main NLP components, namely the TD identification and the TD transliteration. A machine learning approach using conditional random fields is proposed for implementing these two components and reached an accuracy of 87.45 for the TD identification and 90.49 for the automatic generation of dialectal contents by transliteration.
Keywords: Tunisian dialect; TD; language resources; LR; corpora; lexica; identification; transliteration; natural language processing; NLP; machine learning.
DOI: 10.1504/IJCISTUDIES.2020.106487
International Journal of Computational Intelligence Studies, 2020 Vol.9 No.1/2, pp.33 - 51
Received: 06 Mar 2018
Accepted: 06 Sep 2018
Published online: 09 Apr 2020 *