Title: The FRCK clustering algorithm for determining cluster number and removing outliers automatically
Authors: Yubin Guo; Yuhang Wu; Xiaopeng Zhang; Aofeng Bo; Ximing Li
Addresses: South China Agricultural University, Guangzhou 510642, China ' South China Agricultural University, Guangzhou 510642, China ' South China Agricultural University, Guangzhou 510642, China ' Guangzhou HolandAI Technology Co., Ltd., Guangzhou 510006, China ' South China Agricultural University, Guangzhou 510642, China
Abstract: Clustering algorithm is one of the most popular unsupervised algorithms for data grouping. The K-means algorithm is a popular clustering algorithm for its simplicity, ease of implementation and efficiency. But for K-means algorithm, the optical cluster number is difficult to predict, while it is sensitive to outliers. In this paper, we divide outliers into two types, and then prompt a clustering algorithm to remove the two-type outliers and calculate the optimal cluster number in each clustering iteration. The algorithm is a fusion of rough clustering and K-means, abbreviated as FRCK algorithm. In the FRCK algorithm, outliers are removed precisely, therefore the optical cluster number can be more accurate, and the quality of clustering result can be improved accordingly. And this algorithm is proven effective by experiment.
Keywords: clustering; K-means clustering algorithm; optical cluster number; outlier.
DOI: 10.1504/IJCSE.2021.118097
International Journal of Computational Science and Engineering, 2021 Vol.24 No.5, pp.485 - 494
Received: 15 Jun 2020
Accepted: 07 Jan 2021
Published online: 12 Oct 2021 *