K-walks: clustering gene-expression data using a K-means clustering algorithm optimised by random walks Online publication date: Sat, 29-Oct-2016
by Min Yao; Qinghua Wu; Juan Li; Tinghua Huang
International Journal of Data Mining and Bioinformatics (IJDMB), Vol. 16, No. 2, 2016
Abstract: Gene-expression data obtained from the biological experiments always have thousands of dimensions, which can be very confusing and perplexing to biologists when viewed as a whole. Clustering analysis is an explorative data-mining technique for statistical data analysis that is widely used in gene-expression data analysis. Practical approaches employed for solving the clustering problem use iterative procedures such as K-means, which typically converge to one of many local minima. Here, we propose a simulated annealing approximation algorithm that is optimised using random walks to solve the K-means clustering problem. The algorithm is verified with synthetic and real-world data sets and compared with other well-known K-means variants. The new algorithm is less sensitive to initial cluster centres, and the primary strength of our algorithm is its ability to produce high-quality clustering results for thousands of high-dimensional data. However, the algorithm is computationally intensive.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Mining and Bioinformatics (IJDMB):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com