Title: Comparing yinyang and fission-fusion algorithms for accelerating the k-means

Authors: Marcelo Kuchar Matte; Maria do Carmo Nicoletti

Addresses: UNIFACCAMP, Rua Guatemala 167, CEP: 13231-230, C.L. Paulista, SP, Brazil; Instituto Federal de Mato Grosso do Sul – IFMS, Rod. BR-060 S/N, CEP 79.240-000, Jardim, MS, Brazil ' UNIFACCAMP, Rua Guatemala 167, CEP: 13231-230, C.L. Paulista, SP, Brazil; Computer Science Department, Universidade Federal de S. Carlos, Rod. Washington Luiz S/N, CEP 13565-905, S. Carlos, SP, Brazil

Abstract: Given both, a set of data instances and an integer value k as input to the k-means clustering algorithm, a clustering represented as a partition of the given set into k groups (clusters) is produced by the algorithm. The k-means is considered relatively scalable and efficient in processing large sets of data instances. However, the inductive process the algorithm implements can have high time-related computational cost, depending on the given set of instances. Considering the algorithm invests most of its runtime in distance calculations, a way of accelerating its execution time is by avoiding calculations when possible. One way of doing that is by using a mathematical result known as triangle inequality. The main goal of the work described in this paper is to empirically investigate the contribution of the triangle inequality employed by two algorithms, the yinyang and the fission-fusion. Results from experiments using both algorithms on 17 datasets are evidence that, in general, the yinyang algorithm had a speed-up performance higher than that of the fission-fusion. However, in datasets where instances are described by a relatively small number of attributes (≤8), the fission-fusion had a better speed-up performance.

Keywords: k-means; optimisation; triangle inequality; accelerating algorithms; yinyang; fission-fusion.

DOI: 10.1504/IJKL.2024.139558

International Journal of Knowledge and Learning, 2024 Vol.17 No.4, pp.408 - 426

Received: 23 Feb 2022
Accepted: 05 Nov 2022

Published online: 04 Jul 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article