Evaluating classification accuracy: the impact of resampling and dataset size Online publication date: Tue, 13-Dec-2016
by Jehad Imlawi; Mohammad Alsharo
International Journal of Business Information Systems (IJBIS), Vol. 24, No. 1, 2017
Abstract: Correct prediction is important criterion in evaluating classifiers in supervised learning context. The accuracy rate is a widely accepted indicator of the probability of misclassification of a classifier. Nevertheless, true accuracy remains unknown in most cases since it is not always possible to include the whole population in a study, and it is difficult to calculate the probability distribution of the data. Therefore, researchers often rely on computing estimation from the available data through sampling. When the available data is small or limited, it is common to rely on a resampling technique for accuracy estimation. In this paper, we study the impact of the resampling against non-resampling estimation method, with different dataset sizes on the sample distribution variance. Initial results indicate that there is a significant difference in the variance of the sample distribution between resampling and non-resampling. We also found that the larger the dataset size, the less significant the difference in variance.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Business Information Systems (IJBIS):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com