Title: Biclustering of diabetic nephropathy and diabetic retinopathy microarray data using a similarity-based biclustering algorithm
Authors: Titin Siswantining; Alhadi Bustamam; Sofia Debi Puspa; Zuherman Rustam; Fahrezal Zubedi
Addresses: Faculty of Mathematics and Natural Sciences, Department of Mathematics, Universitas Indonesia, Pondok Cina, Depok, 16424, Indonesia ' Faculty of Mathematics and Natural Sciences, Department of Mathematics, Universitas Indonesia, Pondok Cina, Depok, 16424, Indonesia ' Faculty of Mathematics and Natural Sciences, Department of Mathematics, Universitas Indonesia, Pondok Cina, Depok, 16424, Indonesia ' Faculty of Mathematics and Natural Sciences, Department of Mathematics, Universitas Indonesia, Pondok Cina, Depok, 16424, Indonesia ' Faculty of Mathematics and Natural Sciences, Department of Mathematics, Universitas Negeri Gorontalo, Gorontalo, 96128, Indonesia
Abstract: Similarity-based biclustering (SBB) algorithm consists of four main phases, transforming data, the construction of row (gene) and column (condition) similarity matrices, the clustering of each similarity matrix and the extraction of the bicluster. In this study, we modified the SBB algorithm at the stage of data transformation using min-max normalisation to identify significant biclusters in diabetic nephropathy and retinopathy microarray data after genes are selected using relative deviations and absolute deviations. Based on the comparison of the silhouette index validation experiments, SBB using partitioning around medoids (PAM) provided better clustering of genes and samples than K-means and agglomerative hierarchical clustering (AHC) (Ward's linkage). Furthermore, the proposed technique identified a meaningful non-overlapping bicluster on a real dataset. Using gene ontology (GO) enrichment analysis and the Bonferroni correction, we have identified biological evidence in each bicluster that is significant in terms of gene functions and biological processes.
Keywords: agglomerative hierarchical clustering; biclustering; diabetic nephropathy; diabetic retinopathy; gene expression; K-means; microarray data; PAM; partitioning around medoids; SBB; similarity-based biclustering.
DOI: 10.1504/IJBRA.2021.117934
International Journal of Bioinformatics Research and Applications, 2021 Vol.17 No.4, pp.343 - 362
Accepted: 13 Dec 2019
Published online: 05 Oct 2021 *