Evaluating the performance of regression algorithms on datasets with missing data Online publication date: Sat, 28-Jun-2014
by Luciano Costa Blomberg; Daiane Hemerich; Duncan Dubugras Alcoba Ruiz
International Journal of Business Intelligence and Data Mining (IJBIDM), Vol. 8, No. 2, 2013
Abstract: Real-world applications frequently involve missing data, turning the data analysis into a non-trivial task. This paper presents an analysis of six representative regression algorithms, evaluating their predictive performance and sensitivity to missing data. For this purpose, we used 20 public datasets and manipulated them to hold controlled levels of missing data. Our empirical analysis shows that RepTree is the least influenced by missing data, being LinearRegression the next. IBK is the most influenced, presenting the highest error. However, M5P remains as the algorithm with best predictive performance, although being only the fourth less influenced by missing data.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Business Intelligence and Data Mining (IJBIDM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com