Title: Cluster quality analysis based on SVD, PCA-based k-means and NMF techniques: an online survey data
Authors: Hemangini Mohanty; Santilata Champati; B.L. Padmasani Barik; Anita Panda
Addresses: Centre for Data Science, Institute of Technical Education and Research, Siksha 'O' Anusandhan Deemed to be University, Bhubaneswar-751030, Odisha, India ' Department of Mathematics, Institute of Technical Education and Research, Siksha 'O' Anusandhan Deemed to be University, Bhubaneswar-751030, Odisha, India ' Department of Mathematics, Institute of Technical Education and Research, Siksha 'O' Anusandhan Deemed to be University, Bhubaneswar-751030, Odisha, India ' Department of Mathematics, Institute of Technical Education and Research, Siksha 'O' Anusandhan Deemed to be University, Bhubaneswar-751030, Odisha, India
Abstract: With the increase in computerisation in every field, a huge amount of data is collected from everywhere. Therefore, extracting useful information has become a necessary task in the present era. Data mining helps to extract the information and uncover the relationship among the data. Clustering is an unsupervised technique used for partitioning objects into several groups and discover the hidden relationship among the data. There are many techniques used for clustering. In this article, a comparative study and analysis of three famous clustering techniques are done: principal component analysis (PCA), singular value decomposition (SVD) and non-negative matrix factorisation (NMF) for the clustering of a database. The database collected through a set of questionnaire surveys related to day-to-day activities. Then a comparison of their natural clustering ability is being done. Also, the use of normalised mutual information (NMI) and purity as two-cluster quality evaluation measures are explored. Then an attempt is made to show the amount of information from the original data matrix that the approximated data matrix contains. Next, to verify the accuracy of the variance covered by the approximated data matrix, the Frobenius norm is used. At last, the results are compared with the variance covered by using singular values, and a detailed analysis of each data matrix is explained.
Keywords: clustering; k-means; non-negative matrix factorisation; NMF; normalised mutual information; NMI; principal component analysis; PCA; purity; singular value decomposition; SVD.
DOI: 10.1504/IJRIS.2023.128368
International Journal of Reasoning-based Intelligent Systems, 2023 Vol.15 No.1, pp.86 - 96
Received: 06 May 2022
Accepted: 26 Jul 2022
Published online: 18 Jan 2023 *