SummaryAs the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
SummaryEukaryotic chromosomes replicate in a temporal order known as the replication-timing program1. During mammalian development, at least half the genome changes replication timing, primarily in units of 400–800 kb (“replication domains”; RDs), whose positions are preserved in different cell types, conserved between species, and appear to confine long-range effects of chromosome rearrangements2–7. Early and late replication correlate strongly with open and closed chromatin compartments identified by high-resolution chromosome conformation capture (Hi-C), and, to a lesser extent, lamina-associated domains (LADs)4,5,8,9. Recent Hi-C mapping has unveiled a substructure of topologically-associating domains (TADs) that are largely conserved in their positions between cell types and are similar in size to RDs8,10. However, TADs can be further sub-stratified into smaller domains, challenging the significance of structures at any particular scale11,12. Moreover, attempts to reconcile TADs and LADs to replication-timing data have not revealed a common, underlying domain structure8,9,13. Here, we localize boundaries of RDs to the early-replicating border of replication-timing transitions and map their positions in 18 human and 13 mouse cell types. We demonstrate that, collectively, RD boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure RD boundaries, largely accounting for the previously reported lack of alignment. Moreover, cell-type specific replication timing of TADs partitions the genome into two large-scale sub-nuclear compartments revealing that replication-timing transitions are indistinguishable from late-replicating regions in chromatin composition and lamina association and accounting for the reduced correlation of replication timing to LADs and heterochromatin. Our results reconcile cell type specific sub-nuclear compartmentalization with developmentally stable chromosome domains and offer a unified model for large-scale chromosome structure and function.
Duplication of the genome in mammalian cells occurs in a defined temporal order referred to as its replication-timing (RT) program. RT changes dynamically during development, regulated in units of 400-800 kb referred to as replication domains (RDs). Changes in RT are generally coordinated with transcriptional competence and changes in subnuclear position. We generated genome-wide RT profiles for 26 distinct human cell types, including embryonic stem cell (hESC)-derived, primary cells and established cell lines representing intermediate stages of endoderm, mesoderm, ectoderm, and neural crest (NC) development. We identified clusters of RDs that replicate at unique times in each stage (RT signatures) and confirmed global consolidation of the genome into larger synchronously replicating segments during differentiation. Surprisingly, transcriptome data revealed that the well-accepted correlation between early replication and transcriptional activity was restricted to RT-constitutive genes, whereas two-thirds of the genes that switched RT during differentiation were strongly expressed when late replicating in one or more cell types. Closer inspection revealed that transcription of this class of genes was frequently restricted to the lineage in which the RT switch occurred, but was induced prior to a late-to-early RT switch and/or down-regulated after an early-to-late RT switch. Analysis of transcriptional regulatory networks showed that this class of genes contains strong regulators of genes that were only expressed when early replicating. These results provide intriguing new insight into the complex relationship between transcription and RT regulation during human development.
The YgjD/Kae1 family (COG0533) has been on the top-10 list of universally conserved proteins of unknown function for over 5 years. It has been linked to DNA maintenance in bacteria and mitochondria and transcription regulation and telomere homeostasis in eukaryotes, but its actual function has never been found. Based on a comparative genomic and structural analysis, we predicted this family was involved in the biosynthesis of N 6 -threonylcarbamoyl adenosine, a universal modification found at position 37 of tRNAs decoding ANN codons. This was confirmed as a yeast mutant lacking Kae1 is devoid of t 6 A. t 6 A À strains were also used to reveal that t 6 A has a critical role in initiation codon restriction to AUG and in restricting frameshifting at tandem ANN codons. We also showed that YaeZ, a YgjD paralog, is required for YgjD function in vivo in bacteria. This work lays the foundation for understanding the pleiotropic role of this universal protein family.
Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.