Title: A clustering-based hybrid approach for dual data reduction
Authors: Saroj Ratnoo; Seema Rathee; Jyoti Ahuja
Addresses: Department of Computer Science and Engineering, Guru Jambheshwar University of Science and Technology, Hisar-125001, India ' Department of Computer Science and Engineering, Guru Jambheshwar University of Science and Technology, Hisar-125001, India ' Department of Computer Science, Government Post Graduate College for Women, Rohtak-124001, India
Abstract: The research on data reduction techniques has become important to enhance the efficacy and efficiency of data mining algorithms which may otherwise be compromised in the presence of a large number of irrelevant attributes and redundant instances. Data can be reduced by selecting either a subset of attributes or instances. Dual selection treats the problem of feature and instance selection together as a single optimisation problem. The problem of dual selection is relatively difficult as it involves an enormously large search space. In this paper, we propose a hybrid instance feature selection; HIFS-CHC method using heterogeneous recombination and cataclysmic mutation; CHC adaptive search genetic algorithm to solve the problem of dual selection. The proposed approach works in two stages. In the first stage, K-means clustering algorithm is used to reduce the search space. The second stage incorporates stratified prototype selection and CHC algorithm for data reduction. The clustering based hybrid scheme is experimentally tested on sixteen benchmark datasets and compared with the other similar data reduction algorithms with respect to the predictive accuracy, reduction rate and execution time. Experimental results show that the proposed method outperforms the other methods in terms of reduction rate and execution time while preserving the predictive accuracy almost at the same level.
Keywords: Feature selection; instance selection; dual selection; data reduction; hybrid evolutionary approach.
DOI: 10.1504/IJIEI.2018.094511
International Journal of Intelligent Engineering Informatics, 2018 Vol.6 No.5, pp.468 - 490
Received: 26 Jan 2018
Accepted: 15 Mar 2018
Published online: 04 Sep 2018 *