20Microsatellites are repeats of 1-6bp units and ~10 million microsatellites have been 21 identified across the human genome. Microsatellites are vulnerable to DNA mismatch 22 germline variation of the microsatellites suggested that the amount of germline variations 34 and somatic mutation rates were correlated. Lastly, analysis of mutations in mismatch 35 repair genes showed that somatic SNVs and short indels had larger functional impact than 36 germline mutations and structural variations. Our analysis provides a comprehensive 37 picture of mutations in the microsatellite regions, and reveals possible causes of mutations, 38as well as provides a useful marker set for MSI detection. 39
40Introduction 41Recent large-scale whole genome sequencing studies have revealed the complexity of the 42 mutational landscape of the cancer genome (1-4). In cancer genomes, various types of 43 mutations, such as SNVs (single nucleotide variants), short indels (insertions and 44 deletions), genomic rearrangements, copy number alterations, insertion of 45 retrotransposons, and virus genome integrations, have been identified, and their 46 oncogenic roles have been characterized (1-5). Additionally, genome sequencing studies 47 have revealed the molecular basis of somatic mutations (6-9). However, somatic 48 mutations in microsatellites or repeat sequences have not been well-characterized in a 49 large whole genome sequencing cohort due to difficulties in accurately detecting 50 mutations using presently available short-read sequencing technologies. 51A microsatellite is defined as a tract of repetitive DNA motif composed of short 52 repeating units (10). The mutation rate of microsatellites has been known to be higher 53 than other genomic regions due to DNA polymerase slippage during DNA replication and 54 repair (10). Due to their fragility, microsatellites are used as markers of genomic 55 instability in cancer (11). In cancer genetics studies, microsatellite instability (MSI) has 56 been used for molecular diagnosis of Lynch syndrome and cancers with mismatch repair 57 deficiency (11). Furthermore, MSI-positive tumors are generally burdened with higher 58 numbers of somatic mutations and present many mutation-associated neo-antigens, which 59 might be recognized by the immune system. Presently, MSI can also be used as a marker 60 to predict the effect of immune therapy (12). The MSI phenotype is most common in 61 colorectal cancers, stomach cancers and uterine endometrial cancers (10-15%), although 62 it has also been observed across many tumor types at a few % (11). The MSI phenotype 63 is defined by the presence of somatic indels of the 2-5 microsatellite makers, whereby 64 BAT25/26 mononucleotide microsatellites are widely used to establish MSI status (11). 65Irrespective of the clinical importance of microsatellite, large-scale analysis of 66 somatic changes in microsatellites across various type of cancers is limited for whole 67 genome sequencing (WGS) data (13, 14). In the current study, we analyzed indels in 68 microsat...