SummarySomatic mutations show variation in density across cancer genomes. Previous studies have shown that chromatin organization and replication time domains are correlated with and thus predictive of this variation 1,2,3,4,5 . Here, we analyse 1,809 whole-genome sequences from nine cancer types 6,7,8 to show that a subset of repetitive DNA sequences called non-B motifs that predict non-canonical secondary structure formation 9,10,11,12 can independently account for variation in mutation density. However, combined with epigenetic factors and replication timing, the variance explained can be improved to 43-76%. Intriguingly, ~2-fold mutation enrichment is observed directly within non-B motifs, is focused on exposed structural components, and is dependent on physical properties that are optimal for secondary structure formation. Therefore, there is mounting evidence that secondary structures arising from non-B motifs are not simply associated with increased mutation density, they are possibly causally implicated. Our results suggest that they are determinants of mutagenesis and increase the likelihood of recurrent mutations in the genome 13,6 . This analysis calls for caution in the interpretation of recurrent mutations and highlights the importance of taking non-B motifs, that can simply be inferred from the reference sequence, into consideration in background models of mutability henceforth.. CC-BY 4.0 International license It is made available under a was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint (which . http://dx.doi.org/10.1101/146621 doi: bioRxiv preprint first posted online 3
Main TextThe canonical right-handed DNA double-helical structure, known as B-DNA, has been recognized since 1953. Although B-DNA is the predominant configuration inside the cell, more than 20 non-canonical secondary structures have been reported 9 . These alternative structures include triple-helices, hairpins, cruciforms and slipped structures, and they are more likely to form at particular repetitive sequences such as mirror repeats, inverted repeats, direct repeats and short tandem repeats 10 . Non-canonical secondary structures are associated with increased mutability according to in vitro studies of prokaryotic 14,15 and eukaryotic cells 16,17,18,19,20,21,22,23,24,25,26 . Here, we methodically explore the relationship between secondary structures and somatic mutability, focusing on seven common types of sequence motifs prone to forming non-canonical secondary structures, hereafter referred to as non-B DNA motifs for brevity: direct repeats (DR), G-quadruplexes (G4), inverted repeats (IR), mirror repeats (MR), H-DNA, short tandem repeats (STR) and Z-DNA (Fig. 1a-f, definitions of each of these can be found in Methods).We systematically explored each of the seven non-B DNA motifs in the human reference sequence (Methods) 11 . Most motifs are <50 bps (Fig. 1g), and each category encompasses 0.07% to 4% of the human geno...