Effective framework for protein structure prediction Online publication date: Tue, 20-Nov-2012
by Nagamma Patil; Durga Toshniwal; Kumkum Garg
International Journal of Functional Informatics and Personalised Medicine (IJFIPM), Vol. 4, No. 1, 2012
Abstract: This paper presents a computational system to predict protein structure using N-grams and a wrapper feature selection framework (the N-gram is a subsequence composed of N characters, extracted from a larger sequence). N-gram features are extracted from a dataset consisting of 277 domains: 70 all-α domains, 61 all-β domains, 81 α/β domains and 65 α + β domains. A wrapper feature selection system, GA-SVM, is applied to obtain an optimised feature set. Using the optimised 3070-feature subset, a classifier model is trained and tested in the Support Vector Machine (SVM) learning system. This model achieves an overall accuracy of 88.09%, evaluated by a 10-fold cross-validation test. This value is 4.7% higher than the one using the initial 6,414 features. Experimental results also illustrate that employing a feature subset selection, by using the proposed GA-SVM wrapper approach, has enhanced classification accuracy in comparison to other GA-based wrapper approaches and existing protein sequence encoding methods.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Functional Informatics and Personalised Medicine (IJFIPM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com