Artificial intelligence (AI) can accelerate catalyst
design by
identifying key physicochemical descriptive parameters correlated
with the underlying processes triggering, favoring, or hindering the
performance. In analogy to genes in biology, these parameters might
be called “materials genes” of heterogeneous catalysis.
However, widely used AI methods require big data, and only the smallest
part of the available data meets the quality requirement for data-efficient
AI. Here, we use rigorous experimental procedures, designed to consistently
take into account the kinetics of the catalyst active states formation,
to measure 55 physicochemical parameters as well as the reactivity
of 12 catalysts toward ethane, propane, and n-butane
oxidation reactions. These materials are based on vanadium or manganese
redox-active elements and present diverse phase compositions, crystallinities,
and catalytic behaviors. By applying the sure-independence-screening-and-sparsifying-operator
symbolic-regression approach to the consistent data set, we identify
nonlinear property–function relationships depending on several
key parameters and reflecting the intricate interplay of processes
that govern the formation of olefins and oxygenates: local transport,
site isolation, surface redox activity, adsorption, and the material
dynamical restructuring under reaction conditions. These processes
are captured by parameters derived from N2 adsorption,
X-ray photoelectron spectroscopy (XPS), and near-ambient-pressure
in situ XPS. The data-centric approach indicates the most relevant
characterization techniques to be used for catalyst design and provides
“rules” on how the catalyst properties may be tuned
in order to achieve the desired performance.