Title: Performance evaluation of oversampling algorithm: MAHAKIL using ensemble classifiers
Authors: C. Arun; C. Lakshmi
Addresses: Department of Computational Intelligence, School of Computing, SRMIST, Chennai, Tamil Nadu, India ' School of Computing, SRM Institute of Science and Technology, India
Abstract: Class imbalance is a known problem that exists in real-world applications, which consists of disparity in the existence of sample counts of different classes that results in biased performance. The class imbalance issue has been catered by many sampling techniques which may either fall into an oversampling approach that solves issues to a greater extent or under sampling. MAHAKIL is a diversity-based oversampling approach influenced by the theory of inheritance, in which minority samples are synthesised in view of balancing the class using Mahalanobis distance measure. In this study the performance of MAHAKIL algorithm has been tested using various ensemble classifiers which are proved to be effective due to its multi hypothesis learning approach and better performance. The results of the experiment conducted on 20 imbalanced software defect prediction datasets using six different ensemble approaches showcase XGBoost provides better performance and reduced false alarm rate compared to other models.
Keywords: class imbalance; software fault prediction; synthetic samples; over sampling techniques; MAHAKIL; false alarm rate; evolutionary algorithm; ensemble; inheritance.
DOI: 10.1504/IJBIDM.2023.127293
International Journal of Business Intelligence and Data Mining, 2023 Vol.22 No.1/2, pp.1 - 15
Received: 26 Aug 2021
Accepted: 17 Sep 2021
Published online: 30 Nov 2022 *