All atom molecular dynamics (MD) simulations offer a
powerful tool
for molecular modeling, but the short time steps required for numerical
stability of the integrator place many interesting molecular events
out of reach of unbiased simulations. The popular and powerful Markov
state modeling (MSM) approach can extend these time scales by stitching
together multiple short discontinuous trajectories into a single long-time
kinetic model but necessitates a configurational coarse-graining of
the phase space that entails a loss of spatial and temporal resolution
and an exponential increase in complexity for multimolecular systems.
Latent space simulators (LSS) present an alternative formalism that
employs a dynamical, as opposed to configurational, coarse graining
comprising three back-to-back learning problems to (i) identify the
molecular system’s slowest dynamical processes, (ii) propagate
the microscopic system dynamics within this slow subspace, and (iii)
generatively reconstruct the trajectory of the system within the molecular
phase space. A trained LSS model can generate temporally and spatially
continuous synthetic molecular trajectories at orders of magnitude
lower cost than MD to improve sampling of rare transition events and
metastable states to reduce statistical uncertainties in thermodynamic
and kinetic observables. In this work, we extend the LSS formalism
to short discontinuous training trajectories generated by distributed
computing and to multimolecular systems without incurring exponential
scaling in computational cost. First, we develop a distributed LSS
model over thousands of short simulations of a 264-residue proteolysis-targeting
chimera (PROTAC) complex to generate ultralong continuous trajectories
that identify metastable states and collective variables to inform
PROTAC therapeutic design and optimization. Second, we develop a multimolecular
LSS architecture to generate physically realistic ultralong trajectories
of DNA oligomers that can undergo both duplex hybridization and hairpin
folding. These trajectories retain thermodynamic and kinetic characteristics
of the training data while providing increased precision of folding
populations and time scales across simulation temperature and ion
concentration.