Title: An effective ensemble method for missing data imputation
Authors: Bikash Baruah; Manash P. Dutta; Dhruba K. Bhattacharyya
Addresses: Department of Computer Science and Engineering, NIT Arunachal Pradesh, India ' Department of Computer Science and Engineering, NIT Arunachal Pradesh, India ' Department of Computer Science and Engineering, Tezpur University, Tezpur, India
Abstract: The presence of missing data in a dataset plays a vital role in the design of classification, clustering, or regression methods. An efficient missing data imputation can enhance the overall performance of a machine learning method. This paper ensembles k-nearest neighbour imputation, local least square imputation, miss forest imputation, and k-means clustering imputation using the bagging approach to handle missing values over a wide range of datasets. The method has been tested with eight different datasets in terms of root mean square error, median absolute percentage error, mean absolute percentage error, and standard deviation. Experimental results show that our method gives a low error rate compared to its closed competitors.
Keywords: missing data imputation; ensemble method; k-nearest neighbour; KNN; local least square; LLC; miss forest; k-means clustering; KMC.
DOI: 10.1504/IJICS.2023.128846
International Journal of Information and Computer Security, 2023 Vol.20 No.3/4, pp.295 - 314
Received: 25 Mar 2021
Accepted: 26 Jul 2021
Published online: 07 Feb 2023 *