Surface reaction networks involving hydrocarbons exhibit enormous complexity with thousands of species and reactions for all but the very simplest of chemistries. We present a framework for optimization under uncertainty for heterogeneous catalysis reaction networks using surrogate models that are trained on the fly. The surrogate model is constructed by teaching a Gaussian process adsorption energies based on group additivity fingerprints, combined with transition-state scaling relations and a simple classifier for determining the rate-limiting step. The surrogate model is iteratively used to predict the most important reaction step to be calculated explicitly with computationally demanding electronic structure theory. Applying these methods to the reaction of syngas on rhodium(111), we identify the most likely reaction mechanism. Propagating uncertainty throughout this process yields the likelihood that the final mechanism is complete given measurements on only a subset of the entire network and uncertainty in the underlying density functional theory calculations.
Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuel synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbate identity/configurations, perhaps because datasets have been smaller in catalysis than in related fields. To address this, we developed the OC20 dataset, consisting of 1,281,040 density functional theory (DFT) relaxations (∼264,890,000 single-point evaluations) across a wide swath of materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We supplemented this dataset with randomly perturbed structures, short timescale molecular dynamics, and electronic structure analyses. The dataset comprises three central tasks indicative of day-to-day catalyst modeling and comes with predefined train/validation/test splits to facilitate direct comparisons with future model development efforts. We applied three state-of-the-art graph neural network models (CGCNN, SchNet, and DimeNet++) to each of these tasks as baseline demonstrations for the community to build on. In almost every task, no upper limit on model size was identified, suggesting that even larger models are likely to improve on initial results. The dataset and baseline models are both provided as open resources as well as a public leader board to encourage community contributions to solve these important tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.