Zr
metallocenes have significant potential to be highly tunable
polyethylene catalysts through modification of the aromatic ligand
framework. Here we report the development of multiple machine learning
models using a large library (>700 systems) of DFT-calculated zirconocene
properties and barriers for ethylene polymerization. We show that
very accurate machine learning models are possible for HOMO–LUMO
gaps of precatalysts but the performance significantly depends on
the machine learning algorithm and type of featurization, such as
fingerprints, Coulomb matrices, smooth overlap of atomic positions,
or persistence images. Surprisingly, the description of the bonding
hapticity, the number of direct connections between Zr and the ligand
aromatic carbons, only has a moderate influence on the performance
of most models. Despite robust models for HOMO–LUMO gaps, these
types of machine learning models based on structure connectivity type
features perform poorly in predicting ethylene migratory insertion
barrier heights. Therefore, we developed several relatively robust
and accurate machine learning models for barrier heights that are
based on quantum-chemical descriptors (QCDs). The quantitative accuracy
of these models depends on which potential energy surface structure
QCDs were harvested from. This revealed a Hammett-type principle to
naturally emerge showing that QCDs from the π-coordination complexes
provide much better descriptions of the transition states than other
potential-energy structures. Feature importance analysis of the QCDs
provides several fundamental principles that influence zirconocene
catalyst reactivity.