Self-assembling peptide
nanostructures have been shown to be of
great importance in nature and have presented many promising applications,
for example, in medicine as drug-delivery vehicles, biosensors, and
antivirals. Being very promising candidates for the growing field
of bottom-up manufacture of functional nanomaterials, previous work
(Frederix, et al. 2011 and 2015) has screened all possible amino acid
combinations for di- and tripeptides in search of such materials.
However, the enormous complexity and variety of linear combinations
of the 20 amino acids make exhaustive simulation of all combinations
of tetrapeptides and above infeasible. Therefore, we have developed
an active machine-learning method (also known as “iterative
learning” and “evolutionary search method”) which
leverages a lower-resolution data set encompassing the whole search
space and a just-in-time high-resolution data set which further analyzes
those target peptides selected by the lower-resolution model. This
model uses newly generated data upon each iteration to improve both
lower- and higher-resolution models in the search for ideal candidates.
Curation of the lower-resolution data set is explored as a method
to control the selected candidates, based on criteria such as log
P
. A major aim of this method is to produce the best results
in the least computationally demanding way. This model has been developed
to be broadly applicable to other search spaces with minor changes
to the algorithm, allowing its use in other areas of research.