CAS Common Chemistry ( ) is an open web resource that provides access to reliable chemical substance information for the scientific community. Having served millions of visitors since its creation in 2009, the resource was extensively updated in 2021 with significant enhancements. The underlying dataset was expanded from 8000 to 500,000 chemical substances and includes additional associated information, such as basic properties and computer-readable chemical structure information. New use cases are supported with enhanced search capabilities and an integrated application programming interface. Reusable licensing of the content is provided through a Creative Commons Attribution-Non-Commercial (CC-BY-NC 4.0) license allowing other public resources to integrate the data into their systems. This paper provides an overview of the enhancements to data and functionality, discusses the benefits of the contribution to the chemistry community, and summarizes recent progress in leveraging this resource to strengthen other information sources.
The design of chemical formulations is a challenging, high-dimensional problem. In typical formulations, tens of thousands of ingredients are available for use, yet only a tiny fraction end up in a given formulation. Deformulation, the problem of reverse engineering the precise amounts of each ingredient starting from just a list of ingredients, is similarly challenging but is a key capability for staying up-to-date with industry competitors. Here, we take advantage of a large, curated formulations dataset from CAS, a division of the American Chemical Society, which offers a consistent and highly structured representation of the formulations and the chemical identities of their components to show that a variational autoencoder neural network learns meaningful representations of formulations in various product classes such as antiperspirants and oral care. Furthermore, it can be used in conjunction with a two-step sampling algorithm to generate accurate ingredient amount suggestions for deformulation. Deformulation using a variational autoencoder produces estimates that are significantly more accurate than nearest neighbor methods, extrapolates better to formulations that are significantly different than previously seen formulations, and provides a way to leverage large datasets for industrially relevant capabilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.