We
report the transition metal quantum mechanics (tmQM) data set,
which contains the geometries and properties of a large transition
metal–organic compound space. tmQM comprises 86,665 mononuclear
complexes extracted from the Cambridge Structural Database, including
Werner, bioinorganic, and organometallic complexes based on a large
variety of organic ligands and 30 transition metals (the 3d, 4d, and
5d from groups 3 to 12). All complexes are closed-shell, with a formal
charge in the range {+1, 0, −1}
e
. The tmQM
data set provides the Cartesian coordinates of all metal complexes
optimized at the GFN2-xTB level, and their molecular size, stoichiometry,
and metal node degree. The quantum properties were computed at the
DFT(TPSSh-D3BJ/def2-SVP) level and include the electronic and dispersion
energies, highest occupied molecular orbital (HOMO) and lowest unoccupied
molecular orbital (LUMO) energies, HOMO/LUMO gap, dipole moment,
and natural charge of the metal center; GFN2-xTB polarizabilities
are also provided. Pairwise representations showed the low correlation
between these properties, providing nearly continuous maps with unusual
regions of the chemical space, for example, complexes combining large
polarizabilities with wide HOMO/LUMO gaps and complexes combining
low-energy HOMO orbitals with electron-rich metal centers. The tmQM
data set can be exploited in the data-driven discovery of new metal
complexes, including predictive models based on machine learning.
These models may have a strong impact on the fields in which transition
metal chemistry plays a key role, for example, catalysis, organic
synthesis, and materials science. tmQM is an open data set that can
be downloaded free of charge from
.