Title: Assessing ensemble techniques for imbalanced classification

Authors: Eric P. Jiang

Addresses: University of San Diego, 5998 Alcala Park, San Diego, California 92110, USA

Abstract: Class imbalance represents a pervasive and challenging problem in machine learning and manifests in a wide range of real-world applications, where the distribution of data across different classes is highly skewed. Conventional machine learning algorithms tend to favour majority classes, often resulting in a failure to capture data patterns of minority classes. This bias can lead to undesirable outcomes in practice. This paper addresses the problem of class imbalance by conducting a comprehensive comparative study of various hybrid ensemble approaches that demonstrate promise in mitigating this learning issue. The study encompasses extensive experiments conducted on a diverse collection of datasets gathered from multiple application domains and characterised by a wide range of class imbalance ratios. To facilitate a comprehensive performance assessment of these methods in dealing with imbalanced data, we have deployed a combination of relevant and commonly used performance metrics and additionally, we have leveraged multiple non-parametric statistical tests to evaluate, analyse and compare the results obtained from the selected methods. By doing so, we aim to offer practical insights into which particular methods are better suited for specific contexts, thus aiding practitioners in selecting the appropriate approaches to address class imbalance in their machine learning tasks.

Keywords: learning from imbalanced data; data rebalancing; ensemble learning; performance evaluation and comparison.

DOI: 10.1504/IJBIDM.2025.143927

International Journal of Business Intelligence and Data Mining, 2025 Vol.26 No.1/2, pp.66 - 87

Received: 12 Nov 2023
Accepted: 07 May 2024

Published online: 14 Jan 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article