A long standing goal in protein structure studies is the development of reliable energy functions that can be used both to verify protein models derived from experimental constraints as well as for theoretical protein folding and inverse folding computer experiments. In that respect, knowledge-based statistical pair potentials have attracted considerable interests recently mainly because they include the essential features of protein structures as well as solvent effects at a low computing cost. However, the basis on which statistical potentials are derived have been questioned. In this paper, we investigate statistical pair potentials derived from protein three-dimensional structures, addressing in particular questions related to the form of these potentials, as well as to the content of the database from which they are derived. We have shown that statistical pair potentials depend on the size of the proteins included in the database, and that this dependence can be reduced by considering only pairs of residue close in space (i.e., with a cutoff of 8 A). We have shown also that statistical potentials carry a memory of the quality of the database in terms of the amount and diversity of secondary structure it contains. We find, for example, that potentials derived from a database containing alpha-proteins will only perform best on alpha-proteins in fold recognition computer experiments. We believe that this is an overall weakness of these potentials, which must be kept in mind when constructing a database.
We define a hypergraph representation of a protein which captures its tertiary structure in a loose way. By using the notions of hypergraphs, we define a conformation rule as-a kind of hypergraph rewriting. With a conformation rule, a procedure of conformation from sequences is described. Then we discuss a method of learning a conformation rule from a collection of hypergraph representations of proteins. A polynomial-time PAClearning algorithm is shown for a class of conformations. This algorithm is now being implemented with Common Lisp for experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.