Producing
perfectly regulated nanoparticle samples on a large scale
is challenging and costly for manufacturers, so the ability to define
and reproduce classes of nanoparticles with similar characteristics
is attractive. However, developing structure/class or process/structure/class
relationships is not straightforward. In this study we propose a machine
learning pipeline of grouping nanoparticles based on their similarity
in a high-dimensional feature space via clustering, predicting the
nanoparticle classes from their structural features via classification,
and identifying the relevant features that should be tuned to produce
a specific class via causal inference. Using a simulated ruthenium
nanoparticles data set as an exemplar, a support vector machine trained
on 22 structural features managed to achieve highly accurate classification
of ruthenium nanoparticles into ordered crystalline, polycrystalline,
and disordered noncrystalline nanoparticles with virtually no overfitting
and underfitting and high precision and recall. A Bayesian network
with domain knowledge incorporated via interactive learning was trained
using a hill climbing algorithm to confirm which features are causing
the classes, as opposed to being just correlated to them.