Title: Hierarchical++: improving the hierarchical clustering algorithm

Authors: Wallace Anacleto Pinheiro; Ana Bárbara Sapienza Pinheiro

Addresses: Systems Development Center, Brazilian Army, Brasília, DF, Brazil ' Department of Tropical Medicine, Brazilian University, Brasília, DF, Brazil

Abstract: Hierarchical grouping is a widely used grouping strategy. However, this technique often provides lower results when compared to other approaches, such as K-means clustering. In addition, many algorithms try to correct hierarchical fails refactoring intermediate clustering combination actions, which may worsen performance. In this work, we propose a new set of procedures that alter the hierarchical technique to improve its results. The idea is to do it right the first time, avoiding refactoring previous steps. These modifications involve the concept of golden boxes, based on initial points named seeds, which indicate groups that must keep disconnected. To assess our strategy, we compare the results of some approaches: traditional hierarchical clustering (single-link, complete-link, average, weighted, centroid, and median), K-means, K-means++, and the proposed method, named Hierarchical++. An experimental evaluation indicates that our proposal far surpasses the compared strategies.

Keywords: clustering; grouping; similarity; golden boxes; complex distributions; dendrograms; hierarchical; K-means; seed; centroid.

DOI: 10.1504/IJDMMM.2023.132975

International Journal of Data Mining, Modelling and Management, 2023 Vol.15 No.3, pp.223 - 239

Accepted: 27 Oct 2022
Published online: 22 Aug 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article