Virtual screening benchmarking studies were carried out on 11 targets to evaluate the performance of three commonly used approaches: 2D ligand similarity (Daylight, TOPOSIM), 3D ligand similarity (SQW, ROCS), and protein structure-based docking (FLOG, FRED, Glide). Active and decoy compound sets were assembled from both the MDDR and the Merck compound databases. Averaged over multiple targets, ligand-based methods outperformed docking algorithms. This was true for 3D ligand-based methods only when chemical typing was included. Using mean enrichment factor as a performance metric, Glide appears to be the best docking method among the three with FRED a close second. Results for all virtual screening methods are database dependent and can vary greatly for particular targets.
How well can a QSAR model predict the activity of a molecule not in the training set used to create the model? A set of retrospective cross-validation experiments using 20 diverse in-house activity sets were done to find a good discriminator of prediction accuracy as measured by root-mean-square difference between observed and predicted activity. Among the measures we tested, two seem useful: the similarity of the molecule to be predicted to the nearest molecule in the training set and/or the number of neighbors in the training set, where neighbors are those more similar than a user-chosen cutoff. The molecules with the highest similarity and/or the most neighbors are the best-predicted. This trend holds true for narrow training sets and, to a lesser degree, for many diverse training sets and does not depend on which QSAR method or descriptor is used. One may define the similarity using a different descriptor than that used for the QSAR model. The similarity dependence for diverse training sets is somewhat unexpected. It appears to be greater for those data sets where the association of similar activities vs similar structures (as encoded in the Patterson plot) is stronger. We propose a way to estimate the reliability of the prediction of an arbitrary chemical structure on a given QSAR model, given the training set from which the model was derived.
In the study of globular protein conformations, one customarily measures the similarity in three-dimensional structure by the root-mean-square deviation (RMSD) of the C alpha atomic coordinates after optimal rigid body superposition. Even when the two protein structures each consist of a single chain having the same number of residues so that the matching of C alpha atoms is obvious, it is not clear how to interpret the RMSD. A very large value means they are dissimilar, and zero means they are identical in conformation, but at what intermediate values are they particularly similar or clearly dissimilar? While many workers in the field have chosen arbitrary cutoffs, and others have judged values of RMSD according to the observed distribution of RMSD for random structures, we propose a self-referential, non-statistical standard. We take two conformers to be intrinsically similar if their RMSD is smaller than that when one of them is mirror inverted. Because the structures considered here are not arbitrary configurations of point atoms, but are compact, globular, polypeptide chains, our definition is closely related to similarity in radius of gyration and overall chain folding patterns. Being strongly similar in our sense implies that the radii of gyration must be nearly identical, the root-mean-square deviation in interatomic distances is linearly related to RMSD, and the two chains must have the same general fold. Only when the RMSD exceeds this level can parts of the polypeptide chain undergo nontrivial rearrangements while remaining globular. This enables us to judge when a prediction of a protein's conformation is "correct except for minor perturbations", or when the ensemble of protein structures deduced from NMR experiments are "basically in mutual agreement".
We have devised a continuous function of interresidue contacts in globular proteins such that the X-ray crystal structure has a lower function value than that of thousands of protein-like alternative conformations. Although we fit the adjustable parameters of the potential using only 10,000 alternative structures for a selected training set of 37 proteins, a grand total of 530,000 constraints was satisfied, derived from 73 proteins and their numerous alternative conformations. In every case where the native conformation is adequately globular and compact, according to objective criteria we have developed, the potential function always favors the native over all alternatives by a substantial margin. This is true even for an additional three proteins never used in any way in the fitting procedure. Conformations differing only slightly from the native, such as those coming from crystal structures of the same protein complexed with different ligands or from crystal structures of point mutants, have function values very similar to the native's and always less than those of alternatives derived from substantially different crystal structures. This holds for all 95 structures that are homologous to one or another of various proteins we used. Realizing that this potential should be useful for modeling the conformation of new protein sequences from the body of protein crystal structures, we suggest a test for deciding whether a nearly correct approximation to the native conformation has been found.
One approach to estimating the "chemical tractability" of a candidate protein target where we know the atomic resolution structure is to examine the physical properties of potential binding sites. A number of other workers have addressed this issue. We characterize ~290,000 "pockets" from ~42,000 protein crystal structures in terms of a three parameter "pocket space": volume, buriedness, and hydrophobicity. A metric DLID (drug-like density) measures how likely a pocket is to bind a drug-like molecule. This is calculated from the count of other pockets in its local neighborhood in pocket space that contain drug-like cocrystallized ligands and the count of total pockets in the neighborhood. Surprisingly, despite being defined locally, a global trend in DLID can be predicted by a simple linear regression on log(volume), buriedness, and hydrophobicity. Two levels of simplification are necessary to relate the DLID of individual pockets to "targets": taking the best DLID per Protein Data Bank (PDB) entry (because any given crystal structure can have many pockets), and taking the median DLID over all PDB entries for the same target (because different crystal structures of the same protein can vary because of artifacts and real conformational changes). We can show that median DLIDs for targets that are detectably homologous in sequence are reasonably similar and that median DLIDs correlate with the "druggability" estimate of Cheng et al. (Nature Biotechnology 2007, 25, 71-75).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.