Machine learning methods offer the opportunity to design
new functional
materials on an unprecedented scale; however, building the large,
diverse databases of molecules on which to train such methods remains
a daunting task. Automated computational chemistry modeling workflows
are therefore becoming essential tools in this data-driven hunt for
new materials with novel properties, since they offer a means by which
to create and curate molecular databases without requiring significant
levels of user input. This ensures that well-founded concerns regarding
data provenance, reproducibility, and replicability are mitigated.
We have developed a versatile and flexible software package, PySoftK
(Python Soft Matter at King’s College London) that provides
flexible, automated computational workflows to create, model, and
curate libraries of polymers with minimal user intervention. PySoftK
is available as an efficient, fully tested, and easily installable
Python package. Key features of the software include the wide range
of different polymer topologies that can be automatically generated
and its fully parallelized library generation tools. It is anticipated
that PySoftK will support the generation, modeling, and curation of
large polymer libraries to support functional materials discovery
in the nanotechnology and biotechnology arenas.