Title: StruLocPred: structure-based protein subcellular localisation prediction using multi-class support vector machine

Authors: Wengang Zhou; Julie A. Dickerson

Addresses: Bioinformatics and Computational Biology Program, Electrical and Computer Engineering Department, Virtual Reality Applications Center, Iowa State University, Ames, IA 50011, USA ' Bioinformatics and Computational Biology Program, Electrical and Computer Engineering Department, Virtual Reality Applications Center, Iowa State University, Ames, IA 50011, USA

Abstract: Knowledge of protein subcellular locations can help decipher a protein's biological function. This work proposes new features: sequence-based: Hybrid Amino Acid Pair (HAAP) and two structure-based: Secondary Structural Element Composition (SSEC) and solvent accessibility state frequency. A multi-class Support Vector Machine is developed to predict the locations. Testing on two established data sets yields better prediction accuracies than the best available systems. Comparisons with existing methods show comparable results to ESLPred2. When StruLocPred is applied to the entire Arabidopsis proteome, over 77% of proteins with known locations match the prediction results. An implementation of this system is at http://wgzhou.ece. iastate.edu/StruLocPred/.

Keywords: protein subcellular localisation; structural features; multi-class SVM; support vector machine; arabidopsis proteome; StruLocPred server; protein function; bioinformatics.

DOI: 10.1504/IJDMB.2012.048173

International Journal of Data Mining and Bioinformatics, 2012 Vol.6 No.2, pp.130 - 143

Received: 06 Aug 2009
Accepted: 31 May 2010

Published online: 17 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article