Title: Instance driven clustering for the imputation of missing data in KDD
Authors: P. Ilango; K. Vijayakumar; M. Rajasekhara Babu
Addresses: School of Computing Science and Engineering, VIT University, Vellore – 632014, Tamilnadu, India ' School of Computing Science and Engineering, VIT University, Vellore – 632014, Tamilnadu, India ' School of Computing Science and Engineering, VIT University, Vellore – 632014, Tamilnadu, India
Abstract: Ongoing research and development process in medical data mining have opened up versatile computer assisted approaches for effective clinical decisions. The nature and quality of the selected sample for training is largely responsible for the performance of the data mining algorithms. The large quantities of cumulative data collected from various sources suffer from qualitative deficiency factors such as inconsistency, incompleteness and redundancy. Addressing the prime problem of missing data is vital as it may introduce a bias into the model under evaluation, at times leading to inaccurate results. Imputation of missing data through instance-based clustering methodology is proposed in this paper. A complete dataset, Pima Indian Type II Diabetes, is considered for evaluation of the proposed method and its usefulness and performance are estimated through average imputation error (E). The results illustrate that the proposed clustering method gives a lesser and stable error rate compared to other existing imputation methods.
Keywords: data mining; machine learning; missing data; imputation methods; INST_CLUST_IMPUTE; average imputation error.
DOI: 10.1504/IJCNDS.2014.057988
International Journal of Communication Networks and Distributed Systems, 2014 Vol.12 No.1, pp.69 - 81
Published online: 21 Jun 2014 *
Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article