Computational catalyst
screening has the potential to significantly
accelerate heterogeneous catalyst discovery. Typically, this involves
developing microkinetic reactor models that are based on parameters
obtained from density functional theory and transition-state theory.
To reduce the large computational cost involved in computing various
adsorption and transition-state energies of all possible surface states
on a large number of catalyst models, linear scaling relations for
surface intermediates and transition states have been developed that
only depend on a few, typically one or two descriptors, such as the
carbon atom adsorption energy. As a result, only the descriptor values
have to be computed for various active site models to generate volcano
curves in activity or selectivity. Unfortunately, for more complex
chemistries the predictability of linear scaling relations is unknown.
Also, the selection of descriptors is essentially a trial and error
process. Here, using a database of adsorption energies of the surface
species involved in the decarboxylation and decarbonylation of propionic
acid over eight monometalic transition-metal catalyst surfaces (Ni,
Pt, Pd, Ru, Rh, Re, Cu, Ag), we tested if nonlinear machine learning
(ML) models can outperform the linear scaling relations in prediction
accuracy when predicting the adsorption energy for various species
on a metal surface based on data from the rest of the metal surfaces.
We found linear scaling relations to hold well for predictions across
metals with a mean-absolute error of 0.12 eV, and ML methods being
unable to outperform linear scaling relations when the training dataset
contains a complete set of energies for all of the species on various
metal surfaces. Only when the training dataset is incomplete, namely,
contains a random subset of species’ energies for each metal,
a currently unlikely scenario for catalyst screening, do kernel-based
ML models significantly outperform linear scaling relations. We also
found that simple coordinate-free species descriptors, such as bond
counts, achieve as good results as sophisticated coordinate-based
descriptors. Finally, we propose an approach for automatic discovery
of appropriate metal descriptors using principal component analysis.
Computational catalyst discovery involves the development of microkinetic reactor models based on estimated parameters determined from density functional theory (DFT). For complex surface chemistries, the cost of calculating the adsorption energies by DFT for a large number of reaction intermediates can become prohibitive. Here, we have identified appropriate descriptors and machine learning models that can be used to predict part of these adsorption energies given data on the rest of them. Our investigations also included the case when the species data used to train the predictive model is of different size relative to the species the model tries to predict -an extrapolation in the data space which is typically difficult with regular machine learning models.We have developed a neural network based predictive model that combines an established model with the concepts of a convolutional neural network that, 1 arXiv:1910.00623v1 [physics.chem-ph] 1 Oct 2019 when extrapolating, achieves significant improvement over the previous models.
Computational catalyst discovery involves identification of a meaningful model and suitable descriptors that determine the catalyst properties. We study the impact of combining various descriptors (e.g., reaction energies, metal descriptors, and bond counts) for modeling transition-state energies (TS) based on a database of adsorption and TS energies across transition-metal surfaces for the decarboxylation and decarbonylation of propionic acid, a chemistry characteristic for biomass conversion. Results of different machine learning models for more than 1572 descriptor combinations suggest that there is no statistically significant difference between linear and nonlinear models when using the right combination of reactant energies, metal descriptors, and bond counts. However, linear models are inferior when not including bond count and metal descriptors. Furthermore, when there are missing data for reaction steps on all metals, conventional linear scaling is inferior to linear and nonlinear models with proper choice of descriptors that are surprisingly robust.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.