DNA can stabilize
silver nanoclusters (Ag
N
-DNAs) whose atomic
sizes and diverse fluorescence
colors are
selected by nucleobase sequence. These programmable nanoclusters hold
promise for sensing, bioimaging, and nanophononics. However, DNA’s
vast sequence space challenges the design and discovery of Ag
N
-DNAs with tailored properties. In particular,
Ag
N
-DNAs with bright near-infrared luminescence
above 800 nm remain rare, placing limits on their applications for
bioimaging in the tissue transparency windows. Here, we present a
design method for near-infrared emissive Ag
N
-DNAs. By combining high-throughput experimentation and machine
learning with fundamental information from Ag
N
-DNA crystal structures, we distill the salient DNA sequence
features that determine Ag
N
-DNA color,
for the entire known spectral range of these nanoclusters. A succinct
set of nucleobase staple features are predictive of Ag
N
-DNA color. By representing DNA sequences in terms
of these motifs, our machine learning models increase the design success
for near-infrared emissive Ag
N
-DNAs by
12.3 times as compared to training data, nearly doubling the number
of known Ag
N
-DNAs with bright near-infrared
luminescence above 800 nm. These results demonstrate how incorporating
known structure–property relationships into machine learning
models can enhance materials study and design, even for sparse and
imbalanced training data.