Dynamic programming
algorithms within the NUPACK software suite
enable analysis of nucleic acid sequences over complex and test tube
ensembles containing arbitrary numbers of interacting strand species,
serving the needs of researchers in molecular programming, nucleic
acid nanotechnology, synthetic biology, and across the life sciences.
Here, to enhance the underlying physical model, ensure scalability
for large calculations, and achieve dramatic speedups when calculating
diverse physical quantities over complex and test tube ensembles,
we introduce a unified dynamic programming framework that combines
three ingredients: (1) recursions that specify the dependencies between
subproblems and incorporate the details of the structural ensemble
and the free energy model, (2) evaluation algebras that define the
mathematical form of each subproblem, (3) operation orders that specify
the computational trajectory through the dependency graph of subproblems.
The physical model is enhanced using new recursions that operate over
the complex ensemble including coaxial and dangle stacking subensembles.
The recursions are coded generically and then compiled with a quantity-specific
evaluation algebra and operation order to generate an executable for
each physical quantity: partition function, equilibrium base-pairing
probabilities, MFE energy and proxy structure, suboptimal proxy structures,
and Boltzmann sampled structures. For large complexes (e.g., 30 000
nt), scalability is achieved for partition function calculations using
an overflow-safe evaluation algebra, and for equilibrium base-pairing
probabilities using a backtrack-free operation order. A new blockwise
operation order that treats subcomplex blocks for the complex species
in a test tube ensemble enables dramatic speedups (e.g., 20–120×
) using vectorization and caching. With these performance enhancements,
equilibrium analysis of substantial test tube ensembles can be performed
in ≤ 1 min on a single computational core (e.g., partition
function and equilibrium concentration for all complex species of
up to six strands formed from two strand species of 300 nt each, or
for all complex species of up to two strands formed from 80 strand
species of 100 nt each). A new sampling algorithm simultaneously samples
multiple structures from the complex ensemble to yield speedups of
an order of magnitude or more as the number of structures increases
above ≈103. These advances are available within
the NUPACK 4.0 code base () which can be flexibly scripted using the all-new NUPACK Python
module.