As part of the Community Structure-Activity Resource (CSAR) center, a set of 343 high-quality, protein–ligand crystal structures were assembled with experimentally determined Kd or Ki information from the literature. We encouraged the community to score the crystallographic poses of the complexes by any method of their choice. The goal of the exercise was to (1) evaluate the current ability of the field to predict activity from structure and (2) investigate the properties of the complexes and methods that appear to hinder scoring. A total of 19 different methods were submitted with numerous parameter variations for a total of 64 sets of scores from 16 participating groups. Linear regression and nonparametric tests were used to correlate scores to the experimental values. Correlation to experiment for the various methods ranged R2 = 0.58–0.12, Spearman ρ = 0.74–0.37, Kendall τ = 0.55–0.25, and median unsigned error = 1.00–1.68 pKd units. All types of scoring functions—force field based, knowledge based, and empirical—had examples with high and low correlation, showing no bias/advantage for any particular approach. The data across all the participants were combined to identify 63 complexes that were poorly scored across the majority of the scoring methods and 123 complexes that were scored well across the majority. The two sets were compared using a Wilcoxon rank-sum test to assess any significant difference in the distributions of >400 physicochemical properties of the ligands and the proteins. Poorly scored complexes were found to have ligands that were the same size as those in well-scored complexes, but hydrogen bonding and torsional strain were significantly different. These comparisons point to a need for CSAR to develop data sets of congeneric series with a range of hydrogen-bonding and hydrophobic characteristics and a range of rotatable bonds.
Recently some of us made intrinsic aqueous solubility measurements for 132 structurally diverse drugs and biologically significant molecules. We then issued, in conjunction with this Journal, an open prediction challenge in a paper where we reported the intrinsic solubilities for 100 of these compounds as a training set and listed the structures of the remaining 32 compounds which formed the prediction set. 1 This solubility challenge data were also made available online including a link on this Journal's Web site. The formal Solubility Challenge was held from approximately July 15, 2008 through September 15, 2008 although late and/or changed entries were accepted until the end of September. More than 100 entries to the Solubility Challenge have been received. In several cases multiple entries came from the same person or group. In addition, more than 5% of the prediction sheet entries were incomplete in that predictions were not reported for all 32 compounds of the prediction set. These incomplete entries were not included in the overall findings given here, but the submitted prediction sheets, like those of all other entries, were scored and returned to the contestants along with a copy of the findings of this challenge. Overall, 99 completed entries were scored and are reported here.Before presenting some of the findings from this challenge, it is important to reiterate what was stated at the outset of the challenge. The goal of this Solubility Challenge is not to identify a 'winner' but rather to adVance our general understanding of how to better perform aqueous solubility estimations. The findings of this challenge also proVide us a perspectiVe on the current state of predictiVe capabilities.The prediction set of compounds and their measured intrinsic solubilities are given in Table 1. Some qualifiers had to be applied to the prediction set to fairly and unambiguously score the contestants' data. Two of the prediction set compounds exhibit polymorph solubility behavior, and the solubility of each polymorph has been
A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose (). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3–4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pKa. This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.