We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Caenorhabditis elegans is an animal with few cells but a wide diversity of cell types. In this study, we characterize the molecular basis for their specification by profiling the transcriptomes of 86,024 single embryonic cells. We identify 502 terminal and preterminal cell types, mapping most single-cell transcriptomes to their exact position in C. elegans’ invariant lineage. Using these annotations, we find that (i) the correlation between a cell’s lineage and its transcriptome increases from middle to late gastrulation, then falls substantially as cells in the nervous system and pharynx adopt their terminal fates; (ii) multilineage priming contributes to the differentiation of sister cells at dozens of lineage branches; and (iii) most distinct lineages that produce the same anatomical cell type converge to a homogenous transcriptomic state.
Regulation of gene expression by sequence-specific transcription factors is central to developmental programs and depends on the binding of transcription factors with target sites in the genome. To date, most such analyses in Caenorhabditis elegans have focused on the interactions between a single transcription factor with one or a few select target genes. As part of the modENCODE Consortium, we have used chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) to determine the genome-wide binding sites of 22 transcription factors (ALR-1, BLMP-1, CEH-14, CEH-30, EGL-27, EGL-5, ELT-3, EOR-1, GEI-11, HLH-1, LIN-11, LIN-13, LIN-15B, LIN-39, MAB-5, MDL-1, MEP-1, PES-1, PHA-4, PQM-1, SKN-1, and UNC-130) at diverse developmental stages. For each factor we determined candidate gene targets, both coding and non-coding. The typical binding sites of almost all factors are within a few hundred nucleotides of the transcript start site. Most factors target a mixture of coding and non-coding target genes, although one factor preferentially binds to non-coding RNA genes. We built a regulatory network among the 22 factors to determine their functional relationships to each other and found that some factors appear to act preferentially as regulators and others as target genes. Examination of the binding targets of three related HOX factors-LIN-39, MAB-5, and EGL-5-indicates that these factors regulate genes involved in cellular migration, neuronal function, and vulval differentiation, consistent with their known roles in these developmental processes. Ultimately, the comprehensive mapping of transcription factor binding sites will identify features of transcriptional networks that regulate C. elegans developmental processes.
Understanding the in vivo dynamics of protein localization and their physical interactions is important for many problems in Biology. To enable systematic protein function interrogation in a multicelluar context, we built a genome-scale transgenic platform for in vivo expression of fluorescent and affinity tagged proteins in Caenorhabditis elegans under endogenous cis regulatory control. The platform combines computer-assisted transgene design, massively parallel DNA engineering and next generation sequencing to generate a resource of 14637 genomic DNA transgenes, which covers 73% of the proteome. The multipurpose tag used allows any protein of interest to be localized in vivo or affinity purified using standard tag-based assays. We illustrate the utility of the resource by systematic chromatin immunopurification and automated 4D imaging, which produced detailed DNA binding and cell/tissue distribution maps for key transcription factor proteins
How cells adopt different expression patterns is a fundamental question of developmental biology. We quantitatively measured reporter expression of 127 genes, primarily transcription factors, in every cell and with high temporal resolution in C. elegans embryos. Embryonic cells are highly distinct in their gene expression; expression of the 127 genes studied here can distinguish nearly all pairs of cells, even between cells of the same tissue type. We observed recurrent lineage-regulated expression patterns for many genes in diverse contexts. These patterns are regulated in part by the TCF-LEF transcription factor POP-1. Other genes' reporters exhibited patterns correlated with tissue, position, and left–right asymmetry. Sequential patterns both within tissues and series of sublineages suggest regulatory pathways. Expression patterns often differ between embryonic and larval stages for the same genes, emphasizing the importance of profiling expression in different stages. This work greatly expands the number of genes in each of these categories and provides the first large-scale, digitally based, cellular resolution compendium of gene expression dynamics in live animals. The resulting data sets will be a useful resource for future research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.