A large set of organic compounds extracted from the CAS Registry is analyzed to study recent changes in structural diversity. The diversity is characterized using the framework content of the compounds; the framework of a molecule is the scaffold consisting of all its ring systems and all the chain fragments connecting them. The compounds are partitioned based on their year of first report in the literature, which allows framework occurrence frequencies to be compared across a 10-year interval. The results are consistent with a process in which frameworks with the greatest frequency of use in the past are the most likely to be used again, but it is also found that the frequency ordering changes over time. These fluctuations in ordering are attributed to stochastic factors, scientific and economic, that can affect how chemical space is explored. Framework diversity is found to have increased over time despite the extensive reuse of a relatively small number of frameworks; this increase is due to the large number of new frameworks. The long tail of the framework distribution, composed of frameworks that occur in few compounds or only one compound, is found to be a large and growing part of framework space.
Measuring innovation in the pharmaceutical industry is challenging. Counts of new molecular entities (NMEs) approved by the Food and Drug Administration (FDA) are commonly used, but this measure only gauges quantity not innovativeness. A new indicator of innovation for small molecule and peptide drugs based on structural novelty is proposed and used to analyze recent trends in pharmaceutical innovation. We show pharmaceutical innovation has significantly increased over the last several decades despite recent concerns over an innovation crisis and find Pioneers (a NME whose shape and scaffold were not used in any previously FDA-approved drugs) are significantly more likely to be the source of promising new therapies. Analysis of the underlying source of structural innovation indicates that scaffolds first reported in the CAS REGISTRY five or less years prior to their Investigational New Drug application (IND) or on scaffolds populated with 50 or less other compounds at the time of IND tend to be the main source of Pioneers. Our analysis also shows a widening structural innovation gap between large pharmaceutical companies (Big Pharma) and the rest of the ecosystem even though the number of Big Pharma originated Pioneers has increased.
The development of molecular descriptors is a central challenge in cheminformatics. Most approaches use algorithms that extract atomic environments or end-to-end machine learning. However, a looming question is that how do these approaches compare with the critical eye of trained chemists. The CAS fingerprint engages expert chemists to curate chemical motifs, which they deem could influence bioactivity. In this paper, we benchmark the CAS fingerprint against commonly used fingerprints using a well-established benchmark set of 88 targets. We show that the CAS fingerprint outperforms most of the commonly used molecular fingerprints. Analysis of the CAS fingerprint reveals that experts tend to select features that are rarely reported in the literature, though not all rare features are selected. Our analysis also shows that the CAS fingerprint provides a different source of information compared to other commonly used fingerprints. These results suggest that anthropomorphic insights do have predictive power and highlight the importance of a chemist-in-the-loop approach in the era of machine learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.