The three dimensional structure of proteins is useful to carry out the biophysical and biochemical functions in a cell. Approaches to protein structure/fold classification typically extract amino acid sequence features, and machinelearning approaches are applied to classification problem. Protein contact maps are two-dimensional representations of contacts among the amino acid residues in the folded protein structure. Many researchers make note of the way secondary structures are clearly visible in the contact maps where alphahelices are seen as thick bands and the beta-sheets as orthogonal to the diagonal. Some patterns in off-diagonal contact maps correspond to configurations of protein secondary structures. This paper explores the idea of extracting rules from contact maps to represent fold information. Contact maps for proteins of any length are generated. An efficient way to extract Secondary Structure Elements from contact maps is adopted. This method achieves appreciable performance, when compared to the original Secondary Structure Elements. Frequent substructures are extracted using a graph based pattern learning system, SUBDUE, to six folds in All-Alpha structural class. Extracted substructures are mapped to three-dimensional structure that proves the performance of the work. To extract additional features from off-diagonal contact map, Triangle Sub Division Method is implemented and feature set is enhanced to 20 regions of interest. An accuracy of 70% is achieved by the J48 decision tree classifier. The decision tree classifier results, gain understanding of rules generated for each structural class. The differences in regions of interest are distinguished for All-Alpha structural class. This method needs to be validated on other SCOP classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.