The application of machine learning (ML) to materials
chemistry
can accelerate the design process, and when coupled with a detailed
explanation, can guide future research. Shapley value analysis is
a complementary approach capable of providing a comprehensive analysis
of the underlying reasons behind a structure/property relationship.
In this study, we have used data sets of graphene oxide nanomaterials
generated using electronic structure simulations to train ML models
with outstanding accuracy, generalizability, and stability to predict
the formation energy and the Fermi energy and applied Shapley value
analysis to understand the results. Feature important profiles that
rank the value of structural characteristics to each property confirmed
that the underlying structure/property relationships are relatively
simple and scientifically intuitive, even though the ML models need
complex information to achieve high performance. We have also reported
instance influence profiles that rank the value of each individual
graphene oxide structure to the training process. Feature/instance
interactions are also investigated to explain which structural characteristics
make particular structures influential, revealing that the most influential
structures typically have very high or very low concentrations of
H or O. Since the range of concentrations is typically chosen by researchers
based on domain knowledge at the outset, this highlights that extreme
care should be taken when gathering training data as these decisions
will have a very big impact on the final model once trained. In general,
the reproducible workflow demonstrated here can be applied to any
similar materials data set to make reliable model-agnostic predictions
of how the structural characteristics and individual structures contribute
to the prediction of functional properties.