Computational chemistry and other
simulation fields are critically
dependent on computing resources, but few problems scale efficiently
to the hundreds of thousands of processors available in current supercomputersparticularly
for molecular dynamics. This has turned into a bottleneck as new hardware
generations primarily provide more processing units rather than making
individual units much faster, which simulation applications are addressing
by increasingly focusing on sampling with algorithms such as free-energy
perturbation, Markov state modeling, metadynamics, or milestoning.
All these rely on combining results from multiple simulations into
a single observation. They are potentially powerful approaches that
aim to predict experimental observables directly, but this comes at
the expense of added complexity in selecting sampling strategies and
keeping track of dozens to thousands of simulations and their dependencies.
Here, we describe how the distributed execution framework Copernicus
allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state
dependencies of each constituent part, algorithms only need to be
described on conceptual level, after which the execution is maximally
parallel. The fully automated execution facilitates the optimization
of these algorithms with adaptive sampling, where
undersampled regions are automatically detected and targeted without
user intervention. We show how several such algorithms can be formulated
for computational chemistry problems, and how they are executed efficiently
with many loosely coupled simulations using either distributed or
parallel resources with Copernicus.