Title: Network based prediction of protein localisation using diffusion Kernel

Authors: Ananda Mondal; Jianjun Hu

Addresses: Department of Computer Science and Engineering, University of South Carolina, 301 Main St, Columbia, SC 29036, USA ' Computer Science Department, 301 Main St. Columbia, SC 29208, USA

Abstract: We present NetLoc, a novel diffusion Kernel-based Logistic Regression (KLR) algorithm for predicting protein subcellular localisation using four types of protein networks including physical PPI networks, genetic Protein-Protein Interaction (PPI) networks, mixed PPI networks and co-expression networks. NetLoc is applied to yeast protein localisation prediction. The results showed that protein networks can provide rich information for protein localisation prediction, achieving Area Under Curve (AUC) score of 0.93. We also showed that networks with high connectivity and high percentage of co-localised PPI lead to better prediction performance. Investigation showed that NetLoc is a very robust approach which can produce good performance (AUC = 0.75) only using 30% of original interactions and capable of producing overall accuracy greater than 0.5 only with 20% annotation coverage. Compared to the previous network feature based prediction algorithm which achieved AUC scores of 0.49 and 0.52 on the yeast PPI network, NetLoc achieved significantly better overall performance with the AUC of 0.74.

Keywords: NetLoc; protein localisation prediction; protein-protein interaction; PPI networks; genetic networks; co-expression networks; kernel-based logistic regression; diffusion kernel; data mining; bioinformatics; protein subcellular localisation.

DOI: 10.1504/IJDMB.2014.062146

International Journal of Data Mining and Bioinformatics, 2014 Vol.9 No.4, pp.386 - 400

Received: 15 Apr 2011
Accepted: 15 Apr 2011

Published online: 21 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article