Title: Clustering of text documents with keyword weighting function
Authors: A. Christy; G. Meera Gandhi; S. Vaithyasubramanian
Addresses: Faculty of Computing, Sathyabama Institute of Science and Technology, Chennai, India ' Faculty of Computing, Sathyabama Institute of Science and Technology, Chennai, India ' Department of Mathematics, Sathyabama Institute of Science and Technology, Chennai, India
Abstract: In this digital world, data is available in abundance everywhere and it is growing at a phenomenal rate. Making data available readily for decision making is an important task of data analyst. In this article, we propose an unsupervised learning algorithm for text document clustering by adopting keyword weighting function. Documents are pre-processed and relevant keywords based on their weights are grouped together. Clustered keyword weighting (CKW) takes each class in the training collection as a known cluster, and searches for feature weights iteratively to optimise the clustering objective function, in order to retrieve the best clustering result. Performance of CKW is validated by clustering BBC news collection text collections. Experiments were conducted with simple K-means, hierarchical clustering algorithms and our keyword weighting and clustering approach has shown improved cluster quality compared to the other methods.
Keywords: documents; cluster; unsupervised; feature; K-means; normalised.
International Journal of Intelligent Enterprise, 2019 Vol.6 No.1, pp.19 - 31
Received: 06 Mar 2018
Accepted: 02 May 2018
Published online: 04 Jun 2019 *