Title: An improved ACO-based decision tree algorithm for imbalanced datasets
Authors: Muhamad Hasbullah Mohd Razali; Rizauddin Saian; Yap Bee Wah; Ku Ruhana Ku-Mahamud
Addresses: Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Perlis Branch, Arau Campus, 02600, Perlis, Malaysia ' Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Perlis Branch, Arau Campus, 02600, Perlis, Malaysia ' Advanced Analytics Engineering Centre and Center of Statistical and Decision Sciences, Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia ' School of Computing, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia
Abstract: Prediction of the minority class is challenging for classification of datasets with skewed class distribution. Bio-inspired classifier such as ant colony optimisation (ACO) decision tree could not provide effective decision boundaries since its entropy-based heuristics is affected by the strong presence of the majority class. Consequently, the developed trees were dominated by the likelihood of the majority class while the rare class being inadequately represented. This study proposes an improved algorithm called Hellinger-ant-tree-miner (HATM) which is inspired by the ant colony optimisation (ACO) meta heuristic for imbalanced learning using decision tree classification algorithm. The proposed algorithm was compared to the existing algorithm with entropy-based heuristics, ant-tree-miner (ATM) using nine publicly available imbalanced datasets and a simulation study. Simulation procedure reveals the superiority of HATM under imbalanced class environment as the sample size increased. Experimental results show that the performance of the existing algorithm evaluated via minority class prediction (MCP) and F-measure has improved due to the class skew-insensitiveness of Hellinger distance. The statistical significance test shows that HATM has higher mean in both performance measures than ATM, indicating a potential improvement of ACO decision tree structure for imbalanced class domain.
Keywords: ant colony optimisation; ACO; decision tree; classification; Hellinger distance; imbalanced learning; skew-insensitive; heuristics.
DOI: 10.1504/IJMMNO.2021.118402
International Journal of Mathematical Modelling and Numerical Optimisation, 2021 Vol.11 No.4, pp.412 - 427
Received: 18 Apr 2020
Accepted: 23 Jan 2021
Published online: 25 Oct 2021 *