Parallel algorithms for clustering biological graphs on distributed and shared memory architectures Online publication date: Tue, 29-Jul-2014
by Inna Rytsareva; Timothy Chapman; Ananth Kalyanaraman
International Journal of High Performance Computing and Networking (IJHPCN), Vol. 7, No. 4, 2014
Abstract: Graph algorithms on parallel architectures present an interesting case study for irregular applications. In this paper, we address one such irregular application - one of clustering real-world graphs constructed out of biological data using parallel computers. We present the design and evaluation of two different parallel implementations of a serial graph clustering heuristic called the Shingling heuristic, which was developed by Gibson et al. In the OpenMP shared memory implementation pClust-sm, we were able to improve both the asymptotic runtime and memory complexities of the serial implementation, and drastically reduce the time to solution from the order of several days to a few minutes on larger inputs (∼100 M edges). With the Hadoop MapReduce implementation pClust-mr, we were able to demonstrate linear scaling up to 64 cores on modest sized inputs (∼11 M edges) and enhance the problem size reach by about two orders of magnitude relative to a serial implementation.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of High Performance Computing and Networking (IJHPCN):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com