A wide variety of computational algorithms have been developed that strive to capture the chemical similarity between two compounds for use in virtual screening and lead discovery. One limitation of such approaches is that, while a returned similarity value reflects the perceived degree of relatedness between any two compounds, there is no direct correlation between this value and the expectation or confidence that any two molecules will in fact be equally active. A lack of a common framework for interpretation of similarity measures also confounds the reliable fusion of information from different algorithms. Here, we present a probabilistic framework for interpreting similarity measures that directly correlates the similarity value to a quantitative expectation that two molecules will in fact be equipotent. The approach is based on extensive benchmarking of 10 different similarity methods (MACCS keys, Daylight fingerprints, maximum common subgraphs, rapid overlay of chemical structures (ROCS) shape similarity, and six connectivity-based fingerprints) against a database of more than 150,000 compounds with activity data against 23 protein targets. Given this unified and probabilistic framework for interpreting chemical similarity, principles derived from decision theory can then be applied to combine the evidence from different similarity measures in such a way that both capitalizes on the strengths of the individual approaches and maintains a quantitative estimate of the likelihood that any two molecules will exhibit similar biological activity.
We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the ␣͞ ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20-to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.A n important insight into the protein folding problem is the recognition that native-state topology often plays a dominant role in the kinetics of the folding process (1, 2). This concept implies that the subtlety of interactions among 20 different amino acids that give rise to cooperative formation of native structure through backbone hydrogen bonding and specific side-chain packing of the native-state core can often be suppressed and effectively replaced by coarse-grained descriptions that capture the overall topology and spatial distribution of local and nonlocal contacts.Minimalist proteins are coarse-grained models that use an ␣-carbon trace to represent the protein backbone in which structural details of the amino acids and aqueous solvent have been integrated out and replaced with effective bead-bead interactions. In these models, the potential energy functions for bead-bead interactions are typically Go model potentials, which require direct knowledge of native-state tertiary contacts (3). These models are particularly useful in the study of proteins for cases when sequence is unimportant relative to the effects of native-state topology for determining folding rate and mechanism.However Go bead models avoid the more difficult aspect of the protein folding problem, namely its dependence on amino acid sequence. Therefore, it is not surprising that these idealized models can lack a quantitative connection to experiment in some cases (4, 5). Physics-based models return to the original problem of confronting the complexity of amino acid sequence and corresponding interplay of physical interactions that give rise to the particulars of protein stability and kinetics and therefore do not require knowledge of native-state tertiary contacts (6-11). They are more generally applicable, especially when sequence details are equally important to topology, such as that found for two members of the ubiquitin fold class, the Ig-binding protein...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.