A largely unsolved problem in computational biochemistry is the accurate prediction of binding affinities of small ligands to protein receptors. We present a detailed analysis of the systematic and random errors present in computational methods through the use of error probability density functions, specifically for computed interaction energies between chemical fragments comprising a protein-ligand complex. An HIV-II protease crystal structure with a bound ligand (indinavir) was chosen as a model protein-ligand complex. The complex was decomposed into twenty-one (21) interacting fragment pairs, which were studied using a number of computational methods. The chemically accurate complete basis set coupled cluster theory (CCSD(T)/CBS) interaction energies were used as reference values to generate our error estimates. In our analysis we observed significant systematic and random errors in most methods, which was surprising especially for parameterized classical and semiempirical quantum mechanical calculations. After propagating these fragment-based error estimates over the entire protein-ligand complex, our total error estimates for many methods are large compared to the experimentally determined free energy of binding. Thus, we conclude that statistical error analysis is a necessary addition to any scoring function attempting to produce reliable binding affinity predictions.
Accurate potential energy models are necessary for reliable atomistic simulations of chemical phenomena. In the realm of biomolecular modeling, large systems like proteins comprise very many noncovalent interactions (NCIs) that can contribute to the protein's stability and structure. This work presents two high-quality chemical databases of common fragment interactions in biomolecular systems as extracted from high-resolution Protein DataBank crystal structures: 3380 sidechain-sidechain interactions and 100 backbone-backbone interactions that inaugurate the BioFragment Database (BFDb). Absolute interaction energies are generated with a computationally tractable explicitly correlated coupled cluster with perturbative triples [CCSD(T)-F12] "silver standard" (0.05 kcal/mol average error) for NCI that demands only a fraction of the cost of the conventional "gold standard," CCSD(T) at the complete basis set limit. By sampling extensively from biological environments, BFDb spans the natural diversity of protein NCI motifs and orientations. In addition to supplying a thorough assessment for lower scaling force-field (2), semi-empirical (3), density functional (244), and wavefunction (45) methods (comprising >1M interaction energies), BFDb provides interactive tools for running and manipulating the resulting large datasets and offers a valuable resource for potential energy model development and validation.
DNA-encoded chemical libraries (DELs) provide a high-throughput and cost-effective route for screening billions of unique molecules for binding affinity for diverse protein targets. Identifying candidate compounds from these libraries involves affinity selection, DNA sequencing, and measuring enrichment in a sample pool of DNA barcodes. Successful detection of potent binders is affected by many factors, including selection parameters, chemical yields, library amplification, sequencing depth, sequencing errors, library sizes, and the chosen enrichment metric. To date, there has not been a clear consensus about how enrichment from DEL selections should be measured or reported. We propose a normalized z-score enrichment metric using a binomial distribution model that satisfies important criteria that are relevant for analysis of DEL selection data. The introduced metric is robust with respect to library diversity and sampling and allows for quantitative comparisons of enrichment of n-synthons from parallel DEL selections. These features enable a comparative enrichment analysis strategy that can provide valuable information about hit compounds in early stage drug discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.