Reducing feature selection bias using a model independent performance measure
by Weizeng Ni; Nuo Xu; Honghao Dai; Samuel H. Huang
International Journal of Data Science (IJDS), Vol. 5, No. 3, 2020

Abstract: Feature selection is an important and challenging step in learning from data with small sample size and high dimensionality. The widely-used approach wrapper potentially introduces feature selection bias due to data overfitting. More sophisticated approaches of external cross-validation and dual-loop cross-validation are proposed to reduce bias, but they tend to bring in excessive variability for data with small sample. This paper shows that a model independent approach, namely, minimum expected cost of misclassification (MECM), can reduce feature selection bias without cross-validation. An experiment on a synthetic dataset shows that 10-fold dual-loop cross-validation based wrapper has around 33% higher error rate than the noise-free error rate and fails to identify discriminative features consistently in all 10 folds. On the other hand, MECM can select more discriminative features and shows more robustness to different classification models. A real-word colon cancer dataset is further used to demonstrate the effectiveness of MECM.

Online publication date: Tue, 16-Feb-2021

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Science (IJDS):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com