Title: Hierarchical++: improving the hierarchical clustering algorithm
Authors: Wallace Anacleto Pinheiro; Ana Bárbara Sapienza Pinheiro
Addresses: Systems Development Center, Brazilian Army, Brasília, DF, Brazil ' Department of Tropical Medicine, Brazilian University, Brasília, DF, Brazil
Abstract: Hierarchical grouping is a widely used grouping strategy. However, this technique often provides lower results when compared to other approaches, such as K-means clustering. In addition, many algorithms try to correct hierarchical fails refactoring intermediate clustering combination actions, which may worsen performance. In this work, we propose a new set of procedures that alter the hierarchical technique to improve its results. The idea is to do it right the first time, avoiding refactoring previous steps. These modifications involve the concept of golden boxes, based on initial points named seeds, which indicate groups that must keep disconnected. To assess our strategy, we compare the results of some approaches: traditional hierarchical clustering (single-link, complete-link, average, weighted, centroid, and median), K-means, K-means++, and the proposed method, named Hierarchical++. An experimental evaluation indicates that our proposal far surpasses the compared strategies.
Keywords: clustering; grouping; similarity; golden boxes; complex distributions; dendrograms; hierarchical; K-means; seed; centroid.
DOI: 10.1504/IJDMMM.2023.132975
International Journal of Data Mining, Modelling and Management, 2023 Vol.15 No.3, pp.223 - 239
Accepted: 27 Oct 2022
Published online: 22 Aug 2023 *