Title: Imbalanced big data classification model using social spider-cat swarm optimisation weighted incremental learning ensemble classifier
Authors: Vikas Gajananrao Bhowate; T. Hanumantha Reddy
Addresses: St. Vincent Pallotti College of Engineering and Technology, An Autonomous Institution Accredited by NAAC with 'A' Grade, Gavsi Manapur, Wardha Road, Nagpur, Maharashtra-441108, India ' Computer Science and Engineering, Rao Bahadur Y. Mahabaleshwarappa Engineering College, Affiliated to VTU, Belgaum, Approved by AICTE, New Delhi and NAAC B++ Accredited, Cantonment, Ballari, 583104 Karnataka, India
Abstract: Traditional strategies of data classification based on machine learning fail when handling highly imbalanced big data. This research proposes an incremental learning-based ensemble classifier for data imbalance classification. Initially, the big data are pre-processed, and the synthetic samples are then generated using the synthetic minority oversampling technique (SMOTE)-based data balancing strategy. The balanced big data is handled using the MapReduce architecture, which is inbuilt with the proposed incremental learning-based ensemble classifier comprising of artificial neural network (ANN), K-nearest neighbour (K-NN), support vector machine (SVM), decision tree (DT) and the naïve Bayes (NB) classifier. The class output for the proposed method is generated through the fusion parameter, which is decided using the proposed social spider-cat swarm optimisation (SSPCS) algorithm. The proposed method attained an accuracy of 94.37%, a sensitivity of 97%, and a specificity of 97%, which shows the effectiveness of the proposed strategy in imbalanced data classification.
Keywords: ensemble classifier; data imbalance classification; MapReduce framework; incremental learning; optimisation.
DOI: 10.1504/IJIIDS.2022.124091
International Journal of Intelligent Information and Database Systems, 2022 Vol.15 No.3, pp.311 - 340
Received: 03 Aug 2021
Accepted: 21 Oct 2021
Published online: 12 Jul 2022 *