Prediction of organismal viability upon exposure to a
nanoparticle
in varying environmentsas fully specified at the molecular
scalehas emerged as a useful figure of merit in the design
of engineered nanoparticles. We build on our earlier finding that
a bag of artificial neural networks (ANNs) can provide such a prediction
when such machines are trained with a relatively small data set (with
ca. 200 examples). Therein, viabilities were predicted by consensus
using the weighted means of the predictions from the bags. Here, we
confirm the accuracy and precision of the prediction of nanoparticle
viabilities using an optimized bag of ANNs over sets of data examples
that had not previously been used in the training and validation process.
We also introduce the viability strip, rather than a single value,
as the prediction and construct it from the viability probability
distribution of an ensemble of ANNs compatible with the data set.
Specifically, the ensemble consists of the ANNs arising from subsets
of the data set corresponding to different splittings between training
and validation, and the different bags (k-folds).
A
k
−
1
k
machine uses a single partition (or bag)
of k – 1 ANNs each trained on 1/k of the data to obtain a consensus prediction, and a k-bag machine quorum samples the k possible
k
−
1
k
machines available for a given partition.
We find that with increasing k in the k-bag or
k
−
1
k
machines, the viability strips become more
normally distributed and their predictions become more precise. Benchmark
comparisons between ensembles of 4-bag machines and
3
4
fraction machines suggest that the
3
4
fraction machine has similar accuracy while
overcoming some of the challenges arising from divergent ANNs in the
4-bag machines.