The three dimensional structure of proteins is useful to carry out the biophysical and biochemical functions in a cell. Approaches to protein structure/fold classification typically extract amino acid sequence features, and machinelearning approaches are applied to classification problem. Protein contact maps are two-dimensional representations of contacts among the amino acid residues in the folded protein structure. Many researchers make note of the way secondary structures are clearly visible in the contact maps where alphahelices are seen as thick bands and the beta-sheets as orthogonal to the diagonal. Some patterns in off-diagonal contact maps correspond to configurations of protein secondary structures. This paper explores the idea of extracting rules from contact maps to represent fold information. Contact maps for proteins of any length are generated. An efficient way to extract Secondary Structure Elements from contact maps is adopted. This method achieves appreciable performance, when compared to the original Secondary Structure Elements. Frequent substructures are extracted using a graph based pattern learning system, SUBDUE, to six folds in All-Alpha structural class. Extracted substructures are mapped to three-dimensional structure that proves the performance of the work. To extract additional features from off-diagonal contact map, Triangle Sub Division Method is implemented and feature set is enhanced to 20 regions of interest. An accuracy of 70% is achieved by the J48 decision tree classifier. The decision tree classifier results, gain understanding of rules generated for each structural class. The differences in regions of interest are distinguished for All-Alpha structural class. This method needs to be validated on other SCOP classes.
The three dimensional structure of proteins is useful to carry out the biophysical and biochemical functions in a cell. Protein contact maps are 2D representations of contacts among the amino acid residues in the folded protein structure. Proteins are biochemical compounds consisting of one or more polypeptides, facilitating a biological function. Many researchers make note of the way secondary structures are clearly visible in the contact maps where helices are seen as thick bands and the sheets as orthogonal to the diagonal. In this paper, we explore several machine learning algorithms to data driven construction of classifiers for assigning protein off diagonal contact maps. A simple and computationally inexpensive algorithm based on triangle subdivision method is implemented to extract twenty features from off diagonal contact maps. This method successfully characterizes the offdiagonal interactions in the contact map for predicting specific folds. NaiveBayes, J48 and REPTree classification results with Recall 76.38%, 91.66% and 80.32% are obtained respectively. General TermsProtein Contact Map.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.