We report full-length draft
De novo
genome assemblies for 16 widely used inbred mouse strains and reveal extensive strain-specific haplotype variation. We identify and characterise 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome resulting in the completion of 10 new gene structures and 62 new coding loci were added to the reference genome annotation. Notably these genomes revealed a large previously unannotated gene (
Efcab3-like
) encoding 5,874 amino acids.
Efcab3-like
mutant mice display anomalies in multiple brain regions suggesting a role in the regulation of brain development.
SummaryGenerally repressed by epigenetic mechanisms, retrotransposons represent around 40% of the murine genome. At the Agouti viable yellow (Avy) locus, an endogenous retrovirus (ERV) of the intracisternal A particle (IAP) class retrotransposed upstream of the agouti coat-color locus, providing an alternative promoter that is variably DNA methylated in genetically identical individuals. This results in variable expressivity of coat color that is inherited transgenerationally. Here, a systematic genome-wide screen identifies multiple C57BL/6J murine IAPs with Avy epigenetic properties. Each exhibits a stable methylation state within an individual but varies between individuals. Only in rare instances do they act as promoters controlling adjacent gene expression. Their methylation state is locus-specific within an individual, and their flanking regions are enriched for CTCF. Variably methylated IAPs are reprogrammed after fertilization and re-established as variable loci in the next generation, indicating reconstruction of metastable epigenetic states and challenging the generalizability of non-genetic inheritance at these regions.
The most commonly employed mammalian model organism is the laboratory mouse. A wide variety of genetically diverse inbred mouse strains, representing distinct physiological states, disease susceptibilities, and biological mechanisms have been developed over the last century. We report full length draft de novo genome assemblies for 16 of the most widely used inbred strains and reveal for the first time extensive strain-specific haplotype variation. We identify and characterise 2,567 regions on the current Genome Reference Consortium mouse reference genome exhibiting the greatest sequence diversity between strains. These regions are enriched for genes involved in defence and immunity, and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. Several immune related loci, some in previously identified QTLs for disease response have novel haplotypes not present in the reference that may explain the phenotype. We used these genomes to improve the mouse reference genome resulting in the completion of 10 new gene structures, and 62 new coding loci were added to the reference genome annotation. Notably this high quality collection of genomes revealed a previously unannotated gene (Efcab3-like) encoding 5,874 amino acids, one of the largest known in the rodent lineage. Interestingly, Efcab3-like
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.