Inferring the biological function of a protein from its three-dimensional structure as well as explaining why a drug may bind to various targets is of crucial importance to modern drug discovery. Here we present a generic 4833-integer vector describing druggable protein-ligand binding sites that can be applied to any protein and any binding cavity. The fingerprint registers counts of pharmacophoric triplets from the Calpha atomic coordinates of binding-site-lining residues. Starting from a customized data set of diverse protein-ligand binding site pairs, the most appropriate metric and a similarity threshold could be defined for similar binding sites. The method (FuzCav) has been used in various scenarios: (i) screening a collection of 6000 binding sites for similarity to different queries; (ii) classifying protein families (serine endopeptidases, protein kinases) by binding site diversity; (iii) discriminating adenine-binding cavities from decoys. The fingerprint generation and comparison supports ultra-high throughput (ca. 1000 measures/s), does not require prior alignment of protein binding sites, and is able to detect local similarity among subpockets. It is thus particularly well suited to the functional annotation of novel genomic structures with low sequence identity to known X-ray templates.
The present study introduces a novel low-dimensionality fingerprint encoding both ligand and target properties which is suitable to mine protein-ligand chemogenomic space. Whereas ligand properties have been represented by standard descriptors, protein cavities are encoded by a fixed length bit string describing pharmacophoric properties of a definite number of binding site residues. In order to simplify the cavity fingerprint, the concept was applied here to a unique family of targets (G protein-coupled receptors) with a homogeneous cavity description. Particular attention was given to set up data sets of really diverse protein-ligand pairs covering as exhaustively as possible both ligand and target spaces. Several machine learning classification algorithms were trained on two sets of roughly 200000 receptor-ligand fingerprints with a different definition of inactive decoys. Cross-validated models show excellent precision (>0.9) in distinguishing true from false pairs with a particular preference for support vector machine classifiers. When applied to two external test sets of GPCR ligands, the most predictive models were not those performing the best in the previous cross-validation. The ability to recover true GPCR ligands (ligand prediction mode) or true GPCRs (receptor prediction mode) depends on multiple parameters: the molecular complexity of the ligands, the chemical space from which ligand decoys are selected to generate false protein-ligand pairs, and the target space under consideration. In most cases, predicting ligands is easier than predicting receptors. Although receptor profiling is possible, it probably requires a more detailed description of the ligand-binding site. Noteworthy, protein-ligand fingerprints outperform the corresponding ligand fingerprints in mining the GPCR-ligand space. Since they can be applied to a much larger number of receptors than ligand-based fingerprints, protein-ligand fingerprints represent a novel and promising way to directly screen protein-ligand pairs in chemogenomic applications.
As part of a large medicinal chemistry program, we wish to develop novel selective estrogen receptor modulators (SERMs) as potential breast cancer treatments using a combination of experimental and computational approaches. However, one of the remaining difficulties nowadays is to fully integrate computational (i.e., virtual, theoretical) and medicinal (i.e., experimental, intuitive) chemistry to take advantage of the full potential of both. For this purpose, we have developed a Web-based platform, Forecaster, and a number of programs (e.g., Prepare, React, Select) with the aim of combining computational chemistry and medicinal chemistry expertise to facilitate drug discovery and development and more specifically to integrate synthesis into computer-aided drug design. In our quest for potent SERMs, this platform was used to build virtual combinatorial libraries, filter and extract a highly diverse library from the NCI database, and dock them to the estrogen receptor (ER), with all of these steps being fully automated by computational chemists for use by medicinal chemists. As a result, virtual screening of a diverse library seeded with active compounds followed by a search for analogs yielded an enrichment factor of 129, with 98% of the seeded active compounds recovered, while the screening of a designed virtual combinatorial library including known actives yielded an area under the receiver operating characteristic (AU-ROC) of 0.78. The lead optimization proved less successful, further demonstrating the challenge to simulate structure activity relationship studies.
The use of predictive computational methods in the drug discovery process is in a state of continual growth. Over the last two decades, an increasingly large number of docking tools have been developed to identify hits or optimize lead molecules through in-silico screening of chemical libraries to proteins. In recent years, the focus has been on implementing protein flexibility and water molecules. Our efforts led to the development of Fitted first reported in 2007 and further developed since then. In this study, we wished to evaluate the impact of protein flexibility and occurrence of water molecules on the accuracy of the Fitted docking program to discriminate active compounds from inactive compounds in virtual screening (VS) campaigns. For this purpose, a total of 171 proteins cocrystallized with small molecules representing 40 unique enzymes and receptors as well as sets of known ligands and decoys were selected from the Protein Data Bank (PDB) and the Directory of Useful Decoys (DUD), respectively. This study revealed that implementing displaceable crystallographic or computationally placed particle water molecules and protein flexibility can improve the enrichment in active compounds. In addition, an informed decision based on library diversity or research objectives (hit discovery vs lead optimization) on which implementation to use may lead to significant improvements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.