Title: Application of ensemble methods for classification of water quality
Authors: Mohamad Sakizadeh
Addresses: Department of Environmental Sciences, Faculty of Sciences, Shahid Rajaee Teacher Training University, Shahid Shabanloo Avenue, Lavizan, Tehran, Iran
Abstract: Groundwater pollution in Shoosh Aquifer located in Khuzestan Province, Iran, was considered, using an eight years time period data set collected from 30 sampling wells. Cluster analysis rendered a dendrogram where 30 sampling wells were grouped into three statistically significant clusters. The classification methods, k-nearest neighbour and classification tree, were utilised to classify sampling stations, with respect to the level of pollution. The optimum tree depth and number of neighbours were determined by 4-fold misclassification error which both had an error of 0.167. An ensemble was created using these base classifiers. In addition, considering the small sample size of our data in this study, random subspace as a feature selection method was amalgamated with k-nearest neighbour ensemble. The misclassification errors of classification tree and k-nearest neighbour ensembles were 0.13 and 0.10, respectively. The results of this study confirmed the high accuracy of ensemble methods for data classification.
Keywords: groundwater contamination; classification methods; classification tree; k-nearest neighbour; k-NN; ensemble methods.
International Journal of Water, 2017 Vol.11 No.2, pp.114 - 131
Received: 29 Jun 2015
Accepted: 04 Nov 2015
Published online: 22 Apr 2017 *