Title: FE-TAC: an effective document classification method combining feature extraction and feature selection

Authors: Kshetrimayum Nareshkumar Singh; Haobam Mamata Devi; Anjana Kakoti Mahant; Ahongsangbam Dorendro

Addresses: Department of Computer Science, Manipur University, Canchipur, Imphal East District, 795003 Manipur, India ' Department of Computer Science, Manipur University, Canchipur, Imphal East District, 795003 Manipur, India ' Department of Computer Science, Gauhati University, 781014 Guwahati, India ' Department of Computer Science, Manipur University, Canchipur, Imphal East District, 795003 Manipur, India

Abstract: An effective classification method requires the most informative and relevant set of features. In this paper, we discuss an enhanced text classification method combining feature extraction (FE) and feature selection. First, we used the FE method to extract features from text data and then apply the feature selection method to select the most relevant features out of those extracted features. During feature selection, we introduce a new measure called term affinity to the class (TAC) to estimate the degree of retaining capability of the term as a member of the particular class. TAC is computed based on the combination of normalise document frequency and summing up the occurrence frequency of the term to the specific class. Experimental results on three existing datasets - BBC, Classic4, 20 Newsgroup, and our own dataset called 'Sangai' show that the proposed method outperforms the other competent methods in terms of accuracy.

Keywords: bag of words; BoW; document representation; term weights; text classification; word vectors.

DOI: 10.1504/IJADS.2023.134204

International Journal of Applied Decision Sciences, 2023 Vol.16 No.6, pp.717 - 740

Received: 18 Mar 2022
Accepted: 24 Jun 2022

Published online: 13 Oct 2023 *

