In mammals, the cytosine in CG dinucleotides is typically methylated producing
5-methylcytosine (5mC), a chemically less stable form of cytosine that can spontaneously
deaminate to thymidine resulting in a T•G mismatched base pair. Unlike other eukaryotes
that efficiently repair this mismatched base pair back to C•G, in mammals, 5mCG
deamination is mutagenic, sometimes producing TG dinucleotides, explaining the depletion
of CG dinucleotides in mammalian genomes. It was suggested that new TG dinucleotides
generate genetic diversity that may be critical for evolutionary change. We tested this
conjecture by examining the DNA sequence properties of regulatory sequences identified by
DNase I hypersensitive sites (DHSs) in human and mouse genomes. We hypothesized that the
new TG dinucleotides generate transcription factor binding sites (TFBS) that become
tissue-specific DHSs (TS-DHSs). We find that 8-mers containing the CG dinucleotide are
enriched in DHSs in both species. However, 8-mers containing a TG and no CG dinucleotide
are preferentially enriched in TS-DHSs when compared with 8-mers with neither a TG nor a
CG dinucleotide. The most enriched 8-mer with a TG and no CG dinucleotide in
tissue-specific regulatory regions in both genomes is the AP-1 motif
(TGAC/GTCAN), and we find evidence that
TG dinucleotides in the AP-1 motif arose from CG dinucleotides. Additional TS-DHS-enriched
TFBS containing the TG/CA dinucleotide are the E-Box motif
(GCAGCTGC), the NF-1 motif (GGCA—TGCC), and the
GR (glucocorticoid receptor) motif (G-ACA—TGT-C). Our results support the
suggestion that cytosine methylation is mutagenic in tetrapods producing TG dinucleotides
that create TFBS that drive evolution.