Title: An improved method to enhance protein structural class prediction using their secondary structure sequences and genetic algorithm
Authors: Mohammed Hasan Aldulaimi; Suhaila Zainudin; Azuraliza Abu Bakar
Addresses: Data Mining and Optimization Research Group, Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia; General Directorate of Education, Babylon University, Babylon 00964, Iraq ' Data Mining and Optimization Research Group, Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia ' Data Mining and Optimization Research Group, Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia
Abstract: Many approaches have been proposed to enhance the accuracy of protein structural class. However, such approaches did not cover the low-similarity sequences which are proved to be quite challenging. In this study, a 71-dimensional integrated feature vector is extracted from the predicted secondary structure and hydropathy sequence using newly devised strategies for the purpose of categorising proteins into their major structural classes: all-α, all-β, α/β and α+β. A new combined method containing two machine learning algorithms has been proposed for feature selections in this study. Support vector machine (SVM) and genetic algorithm (GA) are combined using the wrapper method for the purpose of selecting top N features based on the level of their importance. The proposed method is evaluated using the jackknife upon two low-similarity sequences datasets, i.e. ASTRAL and D640. The overall accuracies of 83.93 and 92.2% are reported for the predictions pertaining to ASTRALtesting and D640 benchmarks, exceeding most of the current approaches.
Keywords: feature selection; genetic algorithm; hydropathical information; low-similarity; secondary structure sequence; support vector machine.
DOI: 10.1504/IJBRA.2018.094965
International Journal of Bioinformatics Research and Applications, 2018 Vol.14 No.4, pp.376 - 400
Received: 24 Aug 2016
Accepted: 28 Jan 2017
Published online: 28 Sep 2018 *