The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints.
We introduce a theoretical framework that exploits the everincreasing genomic sequence information for protein structure prediction. Structure-based models are modified to incorporate constraints by a large number of non-local contacts estimated from direct coupling analysis (DCA) of co-evolving genomic sequences. A simple hybrid method, called DCA-fold, integrating DCA contacts with an accurate knowledge of local information (e.g., the local secondary structure) is sufficient to fold proteins in the range of 1-3 Å resolution.protein folding | residue contact prediction | contact map estimation | residue-residue coevolution | statistical potentials P roteins are heteropolymers of amino acids that adopt specific 3D structures to perform designated biological tasks. Enormous experimental efforts have been invested to determine a large number of protein structures. Currently, computational structure prediction methods are reasonably successful in describing interactions among residues close (local) in sequence. Given the limited information for residues that are distant in sequence, success in large-scale structure prediction has depended crucially on known structural motifs available in protein databases. In cases where similarity to proteins of known structures exists, methods like fold recognition and homology modeling (1-3) have been shown as successful and effective, according to the Critical Assessment of Techniques for Protein Structure Prediction (4). Nevertheless, the accuracy of these methods is still in many cases far from the resolution needed to explore protein functions.Here we introduce a new computational approach that exploits information from the rapidly growing genomic sequences to complement the currently limited structural databases. Over the years, a variety of methods has been used to study co-evolution in protein sequences and estimation of residue contacts with mixed success (5-11). Recently, methods based on direct coupling analysis (DCA) (12) were shown to predict 50-300 non-local contacts to 70-80% accuracy for a variety of protein domains (13). DCA is based purely on protein sequence information. It uses covariance in homologous protein sequences as an input and deduces a direct interaction between residues (12). Those with strong direction interactions are shown to be related to structurally conserved residue-residue contacts in the protein fold (12, 13). As the contacts predicted by DCA recapitulate major features of the native contact maps, we developed a simple hybrid method integrating DCA contacts and detailed local information, to fold proteins of up to about 200 amino acids to within 3 Å of the native structures.Our methodology is guided by the energy landscape theory (14), which asserts that in a minimally frustrated, funnel-like energy landscape, native contacts are on average favorable and dominant over non-favorable, non-native ones. This drives proteins smoothly toward their native states. Folding simulations, using native contacts in structure-based models (SBM), have been...
A new theoretical survey of proteins' resistance to constant speed stretching is performed for a set of 17 134 proteins as described by a structure-based model. The proteins selected have no gaps in their structure determination and consist of no more than 250 amino acids. Our previous studies have dealt with 7510 proteins of no more than 150 amino acids. The proteins are ranked according to the strength of the resistance. Most of the predicted top-strength proteins have not yet been studied experimentally. Architectures and folds which are likely to yield large forces are identified. New types of potent force clamps are discovered. They involve disulphide bridges and, in particular, cysteine slipknots. An effective energy parameter of the model is estimated by comparing the theoretical data on characteristic forces to the corresponding experimental values combined with an extrapolation of the theoretical data to the experimental pulling speeds. These studies provide guidance for future experiments on single molecule manipulation and should lead to selection of proteins for applications. A new class of proteins, involving cystein slipknots, is identified as one that is expected to lead to the strongest force clamps known. This class is characterized through molecular dynamics simulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.