Ensemble feature selection approach for imbalanced textual data using MapReduce Online publication date: Fri, 12-Nov-2021
by Houda Amazal; Mohammed Ramdani; Mohamed Kissi
International Journal of Business Intelligence and Data Mining (IJBIDM), Vol. 19, No. 4, 2021
Abstract: Feature selection is a fundamental pre-processing phase in text classification. It speeds up machine learning algorithms and improves classification accuracy. In big data context, feature selection techniques have to deal with two major issues which are the huge dimensionality and the imbalancing aspect of data. However, the libraries of big data frameworks, such as Hadoop, only implement a few single feature selection methods whose robustness does not meet the requirements imposed by the large amount of data. To deal with this, we propose in this paper a distributed ensemble feature selection (DEFS) approach for imbalanced large dataset using MapReduce. A set of experiments are being conducted on four datasets to confirm the improvement brought about by the proposed approach. The reported results show that in most cases our method results in better classification performance than other widely used feature selection techniques.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Business Intelligence and Data Mining (IJBIDM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com