Protein structures often feature β-sheets in which adjacent β-strands have large sequence separation. How the folding process orchestrates the formation and correct arrangement of these strands is not comprehensively understood. Particularly challenging are proteins in which β-strands at the N and C termini are neighbors in a β-sheet. The N-terminal β-strand is synthesized early on, but it can not bind to the C terminus before the chain is fully synthesized. During this time, there is a danger that the β-strand at the N terminus interacts with nearby molecules, leading to potentially harmful aggregates of incompletely folded proteins. Simulations of the C-terminal fragment of Top7 show that this risk of misfolding and aggregation can be avoided by a "caching" mechanism that relies on the "chameleon" behavior of certain segments.protein folding | all-atom simulation | folding mechanism | chameleon segment | nonnative intermediates S tructure and function of proteins are determined by their amino acid sequence. How proteins find their functional native form is a long-standing question (1-3). Protein synthesis is directional from the N to the C terminus. In proteins with end-toend β-sheets, there is a danger that the N-terminal strand binds to nearby molecules or other parts of the chain, as the strand cannot bind to the C-terminal strand until the molecule is fully synthesized. Misfolding and aggregation may be the consequence. In our simulations, the N terminus of fragment Glu-2-Leu-50 of the 59-residue CFr (Protein Data Bank ID code 2GJH) (4) avoids the risk of misfolding by growing first into a non-native extension of an existing α-helix. Only after the other structural elements have formed and correctly assembled, does the N terminus unfold and attach to the C-terminal β-sheet as its last closing strand. We speculate that such a temporary caching of β-strands is a common mechanism that eases folding and hinders aggregation.The C-terminal fragment (CFr) (5) of the designed protein Top7 (6) forms a stable homodimer, whose secondary structure remains nearly unchanged up to 98 • C and high concentrations of denaturant (4). It is a model for small fast-folding proteins with complex topology and diverse secondary structure elements (see Fig. 1). Such proteins often have long-distance (in sequence) contacts between β-strands. Unlike helix contacts, these depend on the conformation of a large segment between the strands. It is unlikely that these contacts form before the intermediate segment has folded, as this would lead to a large entropic cost, or even interfere with the folding of the connecting segment. For slowfolding proteins, one can conjecture a "backtracking" mechanism (7) where folding succeeds only after breaking of prematurely formed β-contacts. In this study, we explore in silico the behavior of fast-folding proteins, as computational approaches (8, 9) can resolve details of folding that are beyond the reach of experiments.
Results and DiscussionWe find that the CFr monomer folds to a native-like conform...