Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.
Modelling and understanding properties of materials from first principles require knowledge of the underlyingatomistic structure. This entails knowing the individual chemical identity and position of all atoms involved.Obtaining such information for macro-molecules, nano-particles, clusters, and for the surface, interface, andbulk phases of amorphous and solid materials represents a difficult high-dimensional global optimizationproblem. The rise of machine learning techniques in materials science has, however, led to many compellingdevelopments that may speed up structure searches. The complexity of such new methods has prompted aneed for an efficient way of assembling them into global optimization algorithms that can be experimentedwith. In this paper, we introduce the Atomistic Global Optimization X (AGOX) framework and code, asa customizable approach that enables efficient building and testing of global optimization algorithms. Amodular way of expressing global optimization algorithms is described and modern programming practicesare used to enable that modularity in the freely available AGOX python package. A number of examplesof global optimization approaches are implemented and analyzed. This ranges from random search andbasin-hopping to machine learning aided approaches with on-the-fly learnt surrogate energy landscapes. Themethods are show-cased on problems ranging from supported clusters over surface reconstructions to largecarbon clusters and metal-nitride clusters incorporated into graphene sheets.
We propose a global optimization strategy for atomistic structure determination based on two new concepts: a few-atom complementary energy landscape and atomic role models. Global optimization of costly energy expressions may be aided by performing some of the optimization on model energy landscapes. These are often based on a sum-of-atomic-contributions form that accurately reproduces every local energy minimum of the true energy expression. However, we propose that, by not including all atomic contributions, the resulting energy landscapes may become more convex, making the search for the global optimum more facile. A role model is someone we aspire to be more like; in the same vein we define the role model of an atom to be another atom whose local environment the first atom seeks to obtain itself. Basing a complementary energy landscape on the distance of some atoms from their role models in a feature space, we arrive at a useful few-atom complementary energy landscape. We show that relaxation in this landscape is an effective mutation when employed in an evolutionary algorithm used to identify the bulk cristobalite structure of SiO 2 and the (1 × 4) surface reconstruction of anatase TiO 2 (001).
Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.