Title: Ensemble-based software fault prediction with two staged data pre-processing
Authors: Shubham P. Kulkarni; Sanjeev Patel
Addresses: Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Odisha, India ' Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Odisha, India
Abstract: Software fault prediction is the process of identifying the software modules which are more likely to be defective or faulty before the testing phase of software development life-cycle model. We use software metric values of different modules for the known software project to train the software fault prediction model. Our objective is to implement the ensemble-based models on software fault data sets along with feature selection and data re-sampling techniques to achieve the improved performance. In this paper, we have designed a two-stage data pre-processing technique on the data set before passing it through the ensemble-based model for training. It has been found that the two-stage pre-processing model outperforms the general ensemble-based model. It gives an improvement of 1 to 6% for all the used classifiers viz., Bagging, Dagging, Rotation Forest, Random Forest and AdaBoost.
Keywords: software fault prediction; ensemble-based model; SMOTE; feature selection.
DOI: 10.1504/IJCAT.2023.133297
International Journal of Computer Applications in Technology, 2023 Vol.72 No.3, pp.212 - 222
Received: 23 Jul 2022
Received in revised form: 07 Nov 2022
Accepted: 14 Dec 2022
Published online: 11 Sep 2023 *