Class imbalance and its effect on PCA preprocessing Online publication date: Sat, 30-Aug-2014
by T. Maruthi Padmaja; Bapi S. Raju; Rudra N. Hota; P. Radha Krishna
International Journal of Knowledge Engineering and Soft Data Paradigms (IJKESDP), Vol. 4, No. 3, 2014
Abstract: The performance of classification models is prone to the class imbalance problem, which occurs when one class of data severely outnumbers the other class. Solutions were proposed both at data level and algorithm level to improve the model performance in this phenomenon. Among all, resampling solutions which preprocess the class information at data level, are successfully applied in solving many real-world class imbalance problems. However, principal component analysis (PCA) is one of the prominent preprocessing solution to improve the classifier performance. PCA comprises new subspace from original attributes by maximising the global variance. This work explored the effect of class imbalance on the reduced subspace generated by the principal component analysis (PCA) for two-class classification problem. Initially the effect of class imbalance over PCA preprocessing is studied on synthetic datasets. Obtained results are further validated over ten real-world datasets. This study reveals two major findings: 1) whenever the angular separation between the respective principal axes of majority and minority classes is large then the data imbalance clearly affects the minority class prediction accuracy as well as the minority class data reconstruction from the principal eigen vectors of the combined datasets; 2) balancing the class distribution is crucial to ameliorates the classifier's performance than PCA preprocessing.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Knowledge Engineering and Soft Data Paradigms (IJKESDP):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com