Title: A comparison of text classification methods using different stemming techniques
Authors: Mariem Bounabi; Karim El Moutaouakil; Khalid Satori
Addresses: Computer Sciences, Imaging and Numerical Analysis Laboratory (LIIAN), USMBA University Fes, Fez City, Morocco ' Hoceima National School of Applied Sciences (ENSAH), Mohammed First University, Al-Hoceima, Morocco ' Computer sciences, Imaging and Numerical Analysis Laboratory (LIIAN), USMBA University Fes, Fez City, Morocco
Abstract: In the retrieval of information, two factors have an important impact on the performance of systems: the extract features and the matching process. In this work, we compare three well-known stemming techniques: Lovins stemmer, iterated Lovins and snowball stemmer. Concerning the classification phase, we compare, experimentally, six methods: BNET, NBMU, CNB, RF, SLogicF, and SVM. Basing on this comparison, we propose a new retrieval system by calling the voting method, as a matching tool, to improve the performance of the classical systems. In this paper, we use the TF-IDF algorithm to extract features. The envisaged systems are tested on two databases: BBCNEWS and BBCSPORT. The systems based on Lovins stemmers and on the voting technique give the best results. In fact, for the first databases, the best accuracy observed is for the system Lovins + Vote with a recognition rate of 97%. Concerning the second database, the system snowball +Vote gives us 99% as recognition rate.
Keywords: NBMU; SVM; RF; NB; SLogiF; CNB; voting technique; classification; stemmer; term-weighting.
DOI: 10.1504/IJCAT.2019.101171
International Journal of Computer Applications in Technology, 2019 Vol.60 No.4, pp.298 - 306
Received: 31 May 2017
Accepted: 20 Feb 2018
Published online: 26 Jul 2019 *