Title: Spam email classification and sentiment analysis based on semantic similarity methods
Authors: Ulligaddala Srinivasarao; Aakanksha Sharaff
Addresses: Department of Computer Science and Engineering, National Institute of Technology Raipur, Chhattisgarh, India ' Department of Computer Science and Engineering, National Institute of Technology Raipur, Chhattisgarh, India
Abstract: Electronic mail has widely been used for communication purposes, and the spam filter is required in the e-mail to save storage and protect from security issues. Various techniques based on NLP methods are used to increase spam detection efficiency. Spam detection cannot handle the unbalanced classes and lower efficiency due to irrelevant feature extraction in existing approaches. In this research, sentiment analysis-based semantic FE and hybrid FS techniques were used to increase the spam and non-spam detection efficiency in e-mail. The sentiment analysis is carried out in this proposed method with semantic feature extraction and hybrid FS. The sentiment analysis measures the polarity of the input text and used for e-mail spam classification. Different semantic similarity feature extraction methods are used in this proposed method. The TF-IDF, Information Gain, and Gini Index were used. The proposed semantic similarity and hybrid FS were evaluated with various classifiers. The experimental analysis shows that the Gini index FS technique, word2vec FE, and SVM classifier show the higher performance of 95.17% and RF with Gini index and word2vec methods has 93.3% accuracy in e-mail spam detection.
Keywords: artificial neural network; ANN; hybrid feature selection; HFS; semantic similarity; SVM; TF-IDF.
DOI: 10.1504/IJCSE.2023.129147
International Journal of Computational Science and Engineering, 2023 Vol.26 No.1, pp.65 - 77
Received: 13 Apr 2021
Accepted: 19 Oct 2021
Published online: 23 Feb 2023 *