Chemical databases are an essential tool for data-driven
investigation
of structure–property relationships and for the design of
novel functional compounds. We introduce the first phase of the COMPAS
Projecta COMputational database of Polycyclic Aromatic Systems.
In this phase, we developed two data sets containing the optimized
ground-state structures and a selection of molecular properties of
∼34k and ∼9k cata-condensed polybenzenoid
hydrocarbons (at the GFN2-xTB and B3LYP-D3BJ/def2-SVP levels, respectively)
and placed them in the public domain. Herein, we describe the process
of the data set generation, detail the information available within
the data sets, and show the fundamental features of the generated
data. We analyze the correlation between the two types of computations
as well as the structure–property relationships of the calculated
species. The data and insights gained from them can inform rational
design of novel functional aromatic molecules for use in, e.g., organic
electronics, and can provide a basis for additional data-driven machine-
and deep-learning studies in chemistry.