Title: A hybrid random forest-based feature selection model using mutual information and F-score for preterm birth classification
Authors: Himani S. Deshpande; Leena Ragha
Addresses: Department of Computer Engineering, Ramrao Adik Institute of Technology, Mumbai, Maharashtra, India ' Department of Computer Engineering, Ramrao Adik Institute of Technology, Mumbai, Maharashtra, India
Abstract: Every woman's body is unique and will have some features playing a vital role contributing towards a healthy pregnancy and manually it is difficult to decide the important features to be observed to prevent the pregnancy complications. In this proposal we have consider 21 physical features of 903 women of varied age groups, economy status and health conditions. Variation and information-based random forest (VIBRF) hybrid model using mutual information and F-score is applied to evaluate each feature looking into the variation within the feature and mutual information across the features. We experimented using various classifiers, and it is observed that Gaussian NB has shown most significant improvement in terms of prediction accuracy, from 31% with all features to 80% with our feature selection process. Though SVM prediction accuracy is 84% it is observed AUC drastically improved for GNB by 10%. As it is a medical application, it is important to achieve higher AUC and so through this experiment it is concluded that GNB performs better with proposed model.
Keywords: features selection; F-score; decision tree; random forest; hybrid model; preterm birth; classification.
DOI: 10.1504/IJMEI.2023.127257
International Journal of Medical Engineering and Informatics, 2023 Vol.15 No.1, pp.84 - 96
Received: 15 Sep 2020
Accepted: 17 Feb 2021
Published online: 30 Nov 2022 *