Drug molecules consist of a few tens of atoms connected by covalent bonds. How many such molecules are possible in total and what is their structure? This question is of pressing interest in medicinal chemistry to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new molecular series. To better define the unknown chemical space, we have enumerated 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens forming the chemical universe database GDB-17, covering a size range containing many drugs and typical for lead compounds. GDB-17 contains millions of isomers of known drugs, including analogs with high shape similarity to the parent drug. Compared to known molecules in PubChem, GDB-17 molecules are much richer in nonaromatic heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types.
The chemical space is the ensemble of all possible molecules, which is believed to contain at least 10 60 organic molecules below 500 Da of possible interest for drug discovery. This review summarizes the development of the chemical space concept from enumerating acyclic hydrocarbons in the 1800's to the recent assembly of the chemical universe database GDB. Chemical space travel algorithms can be used to explore defined regions of chemical space by generating focused virtual libraries. Maps of the chemical space are produced from property spaces visualized by principal component analysis or by self-organizing maps, and from structural analyses such as the scaffold-tree or the MQN-system. Virtual screening of virtual chemical space followed by synthesis and testing of the best hits leads to the discovery of new drug molecules.
Reversible epidermal growth factor receptor (EGFR) inhibitors are the first class of small molecules to improve progression-free survival of patients with EGFR-mutated lung cancers. Second-generation EGFR inhibitors introduced to overcome acquired resistance by the T790M resistance mutation of EGFR have thus far shown limited clinical activity in patients with T790M-mutant tumors. In this study, we systematically analyzed the determinants of the activity and selectivity of the second-generation EGFR inhibitors. A focused library of irreversible as well as structurally corresponding reversible EGFR-inhibitors was synthesized for chemogenomic profiling involving over 79 genetically defined NSCLC and 19 EGFR-dependent cell lines. Overall, our results show that the growth-inhibitory potency of all irreversible inhibitors against the EGFR
In the field of medicinal chemistry, the chemical space describes the ensemble of all organic molecules to be considered when searching for new drugs (estimated >1060 molecules), as well as the property spaces in which these molecules are placed for the sake of describing them. Molecules can be enumerated computationally by the millions, which was first undertaken in the field of computer‐aided structure elucidation. Scoring the enumerated virtual libraries by virtual screening has recently become an attractive strategy to prioritize compounds for synthesis and testing. Enumeration methods include combinatorial linking of fragments, genetic algorithms based on cycles of enumeration and selection by ligand‐based or target‐based scoring functions, and exhaustive enumeration from first principles. The chemical space of molecules following simple rules of chemical stability and synthetic feasibility has been enumerated up to 13 atoms of C, N, O, Cl, S, forming the GDB‐13 database with 977 million structures. The database has been organized in a 42‐dimensional chemical space using molecular quantum numbers (MQN) as descriptors, which can be visualized by projection in two dimensions by principal component analysis, and searched within seconds using a Web browser available at www.gdb.unibe.ch. © 2012 John Wiley & Sons, Ltd. This article is categorized under: Computer and Information Science > Chemoinformatics
The chemical universe database GDB-17 contains 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens obeying rules for chemical stability, synthetic feasibility, and medicinal chemistry. GDB-17 was analyzed using 42 integer value descriptors of molecular structure which we term "Molecular Quantum Numbers" (MQN). Principal component analysis and representation of the (PC1, PC2)-plane provided a graphical overview of the GDB-17 chemical space. Rapid ligand-based virtual screening (LBVS) of GDB-17 using the city-block distance CBD(MQN) as a similarity search measure was enabled by a hashed MQN-fingerprint. LBVS of the entire GDB-17 and of selected subsets identified shape similar, scaffold hopping analogs (ROCS > 1.6 and T(SF) < 0.5) of 15 drugs. Over 97% of these analogs occurred within CBD(MQN) ≤ 12 from each drug, a constraint which might help focus advanced virtual screening. An MQN-searchable 50 million subset of GDB-17 is publicly available at www.gdb.unibe.ch .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.