Title: Information criterion-based non-hierarchical clustering
Authors: Isamu Nagai; Katsuyuki Takahashi; Hirokazu Yanagihara
Addresses: Department of International Liberal Studies, School of International Liberal Studies, Chukyo University, Aichi, Nagoya, Japan ' Department of Social System and Management, Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, Tsukuba, Japan ' Department of Mathematics, Graduate School of Science, Hiroshima University, Hiroshima, Higashi-Hiroshima, Japan
Abstract: In the analysis of actual data, it is important to determine whether there are clusters in the data. This can be done using one of several methods of cluster analysis, which can be roughly divided into hierarchical and nonhierarchical clustering methods. Nonhierarchical clustering can be applied to more types of data than can hierarchical clustering (see e.g., Saito and Yadohisa, 2006), and hence, in this paper, we focus on nonhierarchical clustering. In nonhierarchical clustering, the results heavily depend on the number of clusters, and thus it is very important to select the appropriate number of clusters. Bozdogan (1986) and Manning et al. (2009, Section 16.4.1) used formal information criteria, e.g., Aakaike's information criterion (AIC) and so on, for selecting the number of clusters. In this paper, we verify that such formal information criteria work poorly for selecting the number of clusters by conducting numerical examinations. Hence, we extend a formal AIC by adding a new penalty term, and search for an additional penalty with an acceptable selection-performance through numerical experiments.
Keywords: Aakaike's information criterion; AIC; cluster analysis; information criterion; k-means procedure; multivariate linear regression model; non-hierarchical clustering.
DOI: 10.1504/IJKESDP.2017.089504
International Journal of Knowledge Engineering and Soft Data Paradigms, 2017 Vol.6 No.1, pp.1 - 43
Accepted: 03 Feb 2017
Published online: 29 Jan 2018 *