Title: FE-TAC: an effective document classification method combining feature extraction and feature selection
Authors: Kshetrimayum Nareshkumar Singh; Haobam Mamata Devi; Anjana Kakoti Mahant; Ahongsangbam Dorendro
Addresses: Department of Computer Science, Manipur University, Canchipur, Imphal East District, 795003 Manipur, India ' Department of Computer Science, Manipur University, Canchipur, Imphal East District, 795003 Manipur, India ' Department of Computer Science, Gauhati University, 781014 Guwahati, India ' Department of Computer Science, Manipur University, Canchipur, Imphal East District, 795003 Manipur, India
Abstract: An effective classification method requires the most informative and relevant set of features. In this paper, we discuss an enhanced text classification method combining feature extraction (FE) and feature selection. First, we used the FE method to extract features from text data and then apply the feature selection method to select the most relevant features out of those extracted features. During feature selection, we introduce a new measure called term affinity to the class (TAC) to estimate the degree of retaining capability of the term as a member of the particular class. TAC is computed based on the combination of normalise document frequency and summing up the occurrence frequency of the term to the specific class. Experimental results on three existing datasets - BBC, Classic4, 20 Newsgroup, and our own dataset called 'Sangai' show that the proposed method outperforms the other competent methods in terms of accuracy.
Keywords: bag of words; BoW; document representation; term weights; text classification; word vectors.
DOI: 10.1504/IJADS.2023.134204
International Journal of Applied Decision Sciences, 2023 Vol.16 No.6, pp.717 - 740
Received: 18 Mar 2022
Accepted: 24 Jun 2022
Published online: 13 Oct 2023 *