Article: Ensemble-based software fault prediction with two staged data pre-processing Journal: International Journal of Computer Applications in Technology (IJCAT) 2023 Vol.72 No.3 pp.212 - 222 Abstract: Software fault prediction is the process of identifying the software modules which are more likely to be defective or faulty before the testing phase of software development life-cycle model. We use software metric values of different modules for the known software project to train the software fault prediction model. Our objective is to implement the ensemble-based models on software fault data sets along with feature selection and data re-sampling techniques to achieve the improved performance. In this paper, we have designed a two-stage data pre-processing technique on the data set before passing it through the ensemble-based model for training. It has been found that the two-stage pre-processing model outperforms the general ensemble-based model. It gives an improvement of 1 to 6% for all the used classifiers viz., Bagging, Dagging, Rotation Forest, Random Forest and AdaBoost. Inderscience Publishers - linking academia, business and industry through research

Title: Ensemble-based software fault prediction with two staged data pre-processing

Authors: Shubham P. Kulkarni; Sanjeev Patel

Addresses: Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Odisha, India ' Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Odisha, India

Abstract: Software fault prediction is the process of identifying the software modules which are more likely to be defective or faulty before the testing phase of software development life-cycle model. We use software metric values of different modules for the known software project to train the software fault prediction model. Our objective is to implement the ensemble-based models on software fault data sets along with feature selection and data re-sampling techniques to achieve the improved performance. In this paper, we have designed a two-stage data pre-processing technique on the data set before passing it through the ensemble-based model for training. It has been found that the two-stage pre-processing model outperforms the general ensemble-based model. It gives an improvement of 1 to 6% for all the used classifiers viz., Bagging, Dagging, Rotation Forest, Random Forest and AdaBoost.

Keywords: software fault prediction; ensemble-based model; SMOTE; feature selection.

DOI: 10.1504/IJCAT.2023.133297

International Journal of Computer Applications in Technology, 2023 Vol.72 No.3, pp.212 - 222

Received: 23 Jul 2022
Received in revised form: 07 Nov 2022
Accepted: 14 Dec 2022
Published online: 11 Sep 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Ensemble-based software fault prediction with two staged data pre-processing

Keep up-to-date