Title: Iterative statistical kernels on contemporary GPUs

Authors: Thilina Gunarathne; Bimalee Salpitikorala; Arun Chauhan; Geoffrey Fox

Addresses: School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA ' School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA ' School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA ' School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA

Abstract: We present a study of OpenCL implementations of three important kernels that occur frequently in iterative statistical applications: multi-dimensional scaling (MDS), PageRank and K-means clustering. We evaluated their performance on NVIDIA Tesla and Fermi GPGPU cards using dedicated hardware, and in the case of Fermi, also on the Amazon EC2 cloud-computing environment. We explored the optimisation of these kernels by four main techniques: 1) caching invariant data in GPU memory across iterations; 2) selectively placing data in different memory levels; 3) rearranging data in memory; 4) dividing the work between the GPU and the CPU. We also implemented a novel algorithm for MDS and a novel data layout scheme for PageRank. Our optimisations resulted in performance improvements of up to 5× to 6×, compared to naïve OpenCL implementations and up to 100× improvement over single-core CPU. We believe that these categories of optimisations are also applicable to other similar kernels.

Keywords: graphics processing unit; OpenCL; multi-dimensional scaling; MDS; PageRank; K-means clustering; iterative statistical kernels; cloud GPUs; sparse matrix-vector multiplication; computational science; Amazon EC2 GPU instances.

DOI: 10.1504/IJCSE.2013.052118

International Journal of Computational Science and Engineering, 2013 Vol.8 No.1, pp.58 - 77

Received: 28 Jan 2012
Accepted: 19 Mar 2012

Published online: 27 Dec 2013 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article