*authors contributed equally to this work 2 DNA origami is a robust molecular assembly technique by which a single-stranded DNA template is folded by annealing it with hundreds of short 'staple' strands. 1-4 The guiding design principle of nanofabrication by DNA self-assembly is that the target structure is the single most stable configuration; 5 however, the pathway and kinetics of origami assembly are poorly understood. The folding transition is cooperative 4,6,7 , and there is a strong analogy with protein folding: both are governed by information encoded in polymer sequence. [8][9][10][11] Misfolded structures are kinetic traps. The yield of well-folded DNA origami can be low: 2 yield is improved by titration of cations 2,12 or by following empirical design rules, 12,13 but it is frequently necessary to separate wellfolded origami from misfolded objects. 2, 3, 14-16 Here, we present an origami structure that is designed to reveal the assembly process. Our system has the unusual property of having a small set of distinguishable, well-folded shapes that represent discrete and approximately degenerate energy minima in a vast folding landscape. We obtain a high yield of well-folded origami structures, demonstrating the existence of efficient folding pathways. The distribution of shapes provides information about individual trajectories through the folding landscape. We show that the assembly pathway can be steered by rational design and identify similarities to protein folding: assembly is highly cooperative; reversible bond-formation is important in recovering from transient misfoldings; and the early formation of long-range connections can be very effective in forcing particular folds. Expanding the rational design process to include the assembly pathway is the key to reproducible synthesis, which is essential if nucleic acid selfassembly is to continue its rapid development 1-3,17-19 and become a reliable manufacturing technology. 20
3This study is based on a simplified version of the archetypal origami tile 1 and, in particular, on the distribution of observed folds of a 'dimer' variant which contains two copies of the template sequence in head-to-tail repeat. The 'monomer' tile ( Fig. . 1) (Fig. 1c); approximately 80% of tiles appear to be well folded.The 'dimer' template is also circular. It contains two identical copies of the monomer joined head-to-tail and can therefore bind two copies of each staple (Fig. 2). Each pair of body and seam staples can bind in one of two configurations (Fig. 2a) to form either an internal link within each copy of the monomer sequence or a pair of cross-links between the two copies.The total number of possible domain pairings is 2 76 ≈ 10 23 . Although many of these configurations are sterically inaccessible it is clear that the result of reducing the specificity of staple binding is that, as in the case of protein folding, the number of possible states of the system is overwhelmingly greater than the number of well-folded structures. However, in contrast to proteins (and t...