Heterogeneities such
as point defects, inherent to material systems,
can profoundly influence material functionalities critical for numerous
energy applications. This influence in principle can be identified
and quantified through development of large defect data sets which
we call the defect genome, employing high-throughput ab initio calculations. However, high-throughput screening of material models
with point defects dramatically increases the computational complexity
and chemical search space, creating major impediments toward developing
a defect genome. In this work, we overcome these impediments by employing
computationally tractable ab initio models driven
by highly scalable workflows, to study formation and interaction of
various point defects (e.g., O vacancies, H interstitials, and Y substitutional
dopant), in over 80 cubic perovskites, for potential proton-conducting
ceramic fuel cell (PCFC) applications. The resulting defect data sets
identify several promising perovskite compounds that can exhibit high
proton conductivity. Furthermore, the data sets also enable us to
identify and explain, insightful and novel correlations among defect
energies, material identities, and defect-induced local structural
distortions. Such defect data sets and resultant correlations are
necessary to build statistical machine learning models, which are
required to accelerate discovery of new materials.