24It has long been known that canonical 5' splice site (5'SS) GT>GC mutations may be compatible with 25 normal splicing. However, to date, the true scale of canonical 5'SS GT>GC mutations generating wild-26 type transcripts, both in the context of the frequency of such mutations and the level of wild-type 27 transcripts generated from the mutation alleles, remain unknown. Herein, combining data derived from a 28 meta-analysis of 45 informative disease-causing 5'SS GT>GC mutations (from 42 genes) and a cell 29 culture-based full-length gene splicing assay of 103 5'SS GT>GC mutations (from 30 genes), we 30 estimate that ~15-18% of the canonical GT 5'SSs are capable of generating between 1 and 84% normal 31 transcripts as a consequence of the substitution of GT by GC. We further demonstrate that the 32 canonical 5'SSs whose substitutions of GT by GC generated normal transcripts show stronger 33 complementarity to the 5' end of U1 snRNA than those sites whose substitutions of GT by GC did not 34 lead to the generation of normal transcripts. We also observed a correlation between the generation of 35 wild-type transcripts and a milder than expected clinical phenotype but found that none of the available 36 splicing prediction tools were able to accurately predict the functional impact of 5'SS GT>GC mutations.
37Our findings imply that 5'SS GT>GC mutations may not invariably cause human disease but should also 38 help to improve our understanding of the evolutionary processes that accompanied GT>GC subtype 39 switching of U2-type introns in mammals. 40 41 Keywords: Canonical 5' splice site, Full-length gene splicing assay, Genotype and phenotype 42 relationship, Human Gene Mutation Database, Human inherited disease, Meta-analysis, Non-canonical 43 splice donor site, U2-type intron 44 45 49 5' splice site (5'SS) has traditionally been described as 5'-MAG/GURAGU-3' (where M denotes C or A,
50R denotes A or G and / denotes the exon-intron boundary; the corresponding nucleotide positions are 51 denoted -3_-1/+1_+6) although in reality this consensus sequence does not reflect the true extent of 52 sequence variability [6][7][8][9][10][11]. Base-pairing of this 9-bp sequence with 3'-GUCCAUUCA-5' at the 5' end of 53 U1 snRNA ( Figure 1A) is critical for splicing to occur [10,[12][13][14][15]. Although the GT dinucleotide in the 54 first two intronic positions (in the context of DNA sequence) is the most highly conserved portion of the 55 U2-type 5'SS, it was reported, as early as 1983, that GC occasionally occurs in place of GT [16][17][18].
56Subsequent genome-wide analyses have established that this non-canonical 5'SS GC is present as 57 wild-type in ~1% of human U2-type introns [2, 7, 8, 19, 20]. Importantly, the remaining nucleotides in 58 these evolutionarily fixed non-canonical GC 5'SSs exhibit a stronger complementarity to the 3'-59 GUCCAUUCA-5' sequence at the 5' end of U1 snRNA than those in the canonical GT 5'SSs ( Figure 60 1A), thereby in all likelihood compensating for the decreased complementarity between the ...