Title: Comparative regression performances of machine learning methods optimising hyperparameters: application to health expenditures
Authors: Songul Cinaroglu; Onur Baser
Addresses: Faculty of Economics & Administrative Sciences, Hacettepe University, Department of Health Care Management, Ankara, Turkey ' Columbia University, Department of Surgery, Center for Innovation and Outcomes Research, New York, NY, USA; STATinMED Research, New York, NY, USA
Abstract: Machine learning (ML) algorithms are used in various areas. However, there has been no study analysing health expenditures using ML methods. This work is a step forward in comparing the regression performances of lasso (L), K-nearest neighbourhood (KNN), Random Forest (RF) and support vector machine (SVM) regression while changing hyperparameter values. In this study, lambda (λ), number of neighbours (NN), number of trees (NT) and epsilon (ε) parameter for L, KNN, RF and SVM regression were determined as hyperparameters, respectively. K-fold cross-validation was performed to examine regression performance results. Study results show that KNN (R2 > 0.75; RMSE < 0.70; MAE < 0.55) and L (R2 > 0.79; RMSE < 0.20; MAE < 0.15) regression yields better results in predicting health expenditure per capita and out-of-pocket health expenditure (%) respectively. Moreover, L, KNN, RF and SVM regression methods performance differences are statistically significant (p < 0.001). It is hoped that these results will stimulate further interest in using ML methods to predict health expenditures.
Keywords: machine learning; lasso regression; nearest neighbours regression; random forest regression; support vector machine regression; hyperparameter optimisation; black-box optimisation; health expenditures.
DOI: 10.1504/IJBRA.2020.113022
International Journal of Bioinformatics Research and Applications, 2020 Vol.16 No.4, pp.387 - 407
Received: 04 Oct 2016
Accepted: 03 Feb 2018
Published online: 16 Feb 2021 *