Title: Machine learning classifiers with pre-processing techniques for rumour detection on social media: an empirical study
Authors: Mohammed Al-Sarem; Muna Al-Harby; Faisal Saeed; Essa Abdullah Hezzam
Addresses: College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia; Saba'a Region University, Mareb, Yemen ' Information System Department, College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia ' Information System Department, College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia ' Information System Department, College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
Abstract: The rapid increase in popularity of social media helped the users to easily post and share information with others. However, due to uncontrolled nature of social media platforms, such as Twitter and Facebook, it becomes easy to post fake news and misleading information. The task of detecting such problem is known as rumour detection. This task requires data analytics tools due to the massive amount of shared content and the rapid speed at which it is generated. In this work, the authors aimed to study the impact of different text pre-processing techniques on the performance of classifiers when performing rumour detection. The experiments were performed on a dataset of tweets on emerging breaking news stories which cover several events of Saudi political context (EBNS-SPC). The results have shown that pre-processing techniques have a significant impact on increasing the performance of machine learning methods such as support vector machine (SVM), multinomial naïve Bayes (MNB), and K-nearest neighbour (KNN) classifiers. However, the classifiers react differently when different combinations of pre-processing techniques were used.
Keywords: rumour detection; Saudi Arabian news; multinomial naïve Bayes; MNB; support vector machine; SVM; K-nearest neighbour; KNN; Twitter analysis.
International Journal of Cloud Computing, 2022 Vol.11 No.4, pp.330 - 344
Received: 25 Nov 2019
Accepted: 09 Feb 2020
Published online: 09 Aug 2022 *