The incredible capabilities of generative artificial
intelligence
models have inevitably led to their application in the domain of drug
discovery. Within this domain, the vastness of chemical space motivates
the development of more efficient methods for identifying regions
with molecules that exhibit desired characteristics. In this work,
we present a computationally efficient active learning methodology
and demonstrate its applicability to targeted molecular generation.
When applied to c-Abl kinase, a protein with FDA-approved small-molecule
inhibitors, the model learns to generate molecules similar to the
inhibitors without prior knowledge of their existence and even reproduces
two of them exactly. We also show that the methodology is effective
for a protein without any commercially available small-molecule inhibitors,
the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To
facilitate implementation and reproducibility, we made all of our
software available through the open-source ChemSpaceAL Python package.