26Background: Structural variants (SVs) are known to play important roles in a 27 variety of cancers, but their origins and functional consequences are still poorly 28 understood. Many SVs are thought to emerge via errors in the repair processes 29 following DNA double strand breaks (DSBs) and previous studies have 30 experimentally measured DSB frequencies across the genome in cell lines. 31Results: Using these data we derive the first quantitative genome-wide models 32 of DSB susceptibility, based upon underlying chromatin and sequence features. 33These models are accurate and provide novel insights into the mutational 34 mechanisms generating DSBs. Models trained in one cell type can be successfully 35 applied to others, but a substantial proportion of DSBs appear to reflect cell type 36 specific processes. Using model predictions as a proxy for susceptibility to DSBs 37 in tumours, many SV enriched regions appear to be poorly explained by 38 selectively neutral mutational bias alone. A substantial number of these regions 39show unexpectedly high SV breakpoint frequencies given their predicted 40 susceptibility to mutation, and are therefore credible targets of positive selection 41 in tumours. These putatively positively selected SV hotspots are enriched for 42 3 genes previously shown to be oncogenic. In contrast, several hundred regions 43 across the genome show unexpectedly low levels of SVs, given their relatively 44 high susceptibility to mutation. These novel 'coldspot' regions appear to be 45 subject to purifying selection in tumours and are enriched for active promoters 46 and enhancers. 47
Conclusions:We conclude that models of DSB susceptibility offer a rigorous 48 approach to the inference of SVs putatively subject to selection in tumours. 49 50 Keywords: Double strand break, cancer, structural variaton, chromatin, 51 modelling 52 53 Background 54 55 Structural variation (SV) in tumour genomes is known to play important roles in 56 disease progression and may be critical in driving the development of certain 57cancer types (1-3). However, challenges remain not only in ascertaining accurate 58 SV calls, as evidenced by the compendium of SV calling algorithms used in many 59 projects (4-6), but also in predicting their functional impact. Some SVs have 60 apparently direct consequences; for example, amplification of oncogenes leading 61 to overexpression, deletion of tumor suppressors leading to dysfunction, and 62 translocations generating oncogenic fusion proteins (4). Reportedly indirect 63 consequences of SVs include changes in enhancer targeting, affecting the 64 expression of nearby genes, or "enhancer hijacking" (7). However, it remains 65 Project (30) and others, allowing well-matched models to be constructed for all 131 datasets. We demonstrate that these models provide accurate estimates for the 132 expected rate of DSBs in a given region and can be cross applied between DSB 133 datasets. In addition the models can be used to explore tumour SV breakpoint 134 data, to nominate novel regions p...