24Early-onset sporadic rectal cancer (EOSRC) is a unique and predominant colorectal cancer 25 (CRC) subtype in India. In order to understand the tumorigenic process in EOSRC, we 26 performed whole exome sequencing of 47 microsatellite stable EOSRC samples. Signature 1 was 27 the predominant mutational signature in EOSRC, as previously shown in other CRC exome 28 studies. More importantly, we identified TP53, KRAS, APC, PIK3R1 and SMAD4 as significantly 29 mutated (q<0.1) and ARID1A and ARID2 as near-significantly mutated (restricted hypothesis 30 testing; q<0.1) candidate drivers. Unlike the other candidates, the tumorigenic potential of 31 ARID2, encoding a component of the SWI/SNF chromatin remodeling complex, is largely 32 unexplored in CRC. shRNA mediated ARID2 knockdown performed in two different CRC cell 33 lines resulted in significant alterations in transcript levels of cancer-related target genes. More 34 importantly, ARID2 knockdown promoted several tumorigenic features including cell viability, 35proliferation, ability to override contact inhibition of growth, and migration besides significantly 36 increasing tumor formation ability in nude mice. The observed gain in tumorigenic features were 37 rescued upon ectopic expression of ARID2. Analyses of the TCGA CRC dataset revealed poorer 38 survival in patients with ARID2 alterations. We therefore propose ARID2 as a novel tumor 39 suppressor in CRC. 40 41 Keywords: Rectal cancer; mutational signatures; tumor suppressor; cancer driver genes; ARID2 42 4 genes/pathways 28, 42 . We now report identification of ARID2 as a novel tumor suppressor for 66 CRC based on a whole exome sequencing analysis of EOSRC samples. 67
68Results 69
Whole Exome sequencing reveals known and novel characteristics in EOSRC 70To identify molecular alterations underlying rectal adenocarcinoma, we performed whole exome 71 sequencing analysis of 47 carefully selected well annotated microsatellite stable (MSS) rectal 72 tumor and matched normal sample pairs (EOSRC-IN). All samples were from patients aged 73 below 61 years (average 46 years; range 22-60; Table S1). Sequence data analyses and variant 74 calling (see Methods and Figure S1) identified 17,471 substitutions and 1,432 small insertions 75 and deletions (indels) ( Table S2A). The number of substitutions predominated over indels across 76 all samples with a mean rate (per megabase (MB)) of 6.4 (range 1-35 per sample) for 77 substitutions and 0.5 (range 0-2.35 per sample) for indels ( Figure 1a). Five samples (EOSRC-IN-78 1095, 2575, 2603, 2643 and 2669) showed a significantly higher mutation rate (>12/MB) and 79 can be considered to exhibit a 'Hypermutator-like condition' as described in The Cancer Genome 80Atlas (TCGA) study on CRC 6 . Given that all samples were MSS (status of the 'hypermutator-81 like' samples was re-confirmed using the same DNA sample that was used for exome 82 sequencing), it is surprising to find a high mutation rate. The mean rate for substitutions and 83 indels in non-hypermutated (excluding the 'hypermu...