We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 Mb and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than 1/3 of Daphnia’s genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The co-expansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes – including many additional loci within sequenced regions that are otherwise devoid of annotations – are the most responsive genes to ecological challenges.
Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of H. sapiens and key model organisms generated by the Human Genome Project. To address this, we need scalable, cost-effective methods enabling chromosome-scale contiguity. Here we show that genome-wide chromatin interaction datasets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving – for human – 98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.
The functional consequences of genetic variation in mammalian regulatory elements are poorly understood. We report the in vivo dissection of three mammalian liver enhancers at single nucleotide resolution via a massively parallelized reporter assay. For each enhancer, we synthesized a library of >100,000 mutant haplotypes with 2–3% divergence from wild-type. Each haplotype was linked to a unique sequence tag embedded within a transcriptional cassette. We introduced each enhancer library into mouse liver and measured the relative activities of individual haplotypes en masse by sequencing of the transcribed tags. Linear regression yielded highly reproducible estimates of the impact of every possible single nucleotide change on enhancer activity. The functional impact of most mutations was modest, with ~22% impacting activity by >1.2-fold, and only ~3% by >2-fold. These results suggest that mammalian enhancers are relatively robust to single nucleotide changes. Several, but not all positions with higher impact showed evidence for purifying selection, or co-localized with known liver-associated transcription factor binding sites, demonstrating the value of empirical high-resolution functional analysis.
We present a method that harnesses massively parallel DNA synthesis and sequencing for the highthroughput functional analysis of regulatory sequences at single-nucleotide resolution. As a proof of concept, we quantitatively assayed the effects of all possible single-nucleotide mutations for three bacteriophage promoters and three mammalian core promoters in a single experiment per promoter. The method may also serve as a rapid screening tool for regulatory element engineering in synthetic biology.A broad range of methods exist for annotating functional regulatory elements in genomes. These include comparative and ab initio prediction algorithms 1-3 and high-throughput assays such as ChIP-Seq 4 and CAGE 5,6 . Despite much progress, the architectures of the vast majority of regulatory elements have yet to be systematically and quantitatively dissected at high resolution. Effective methods for this include classical saturation mutagenesis 7 and combinatorial promoter shuffling 8,9 , but these have been applied only at low throughput. Furthermore, the effects of promoter modification are measured using techniques that are not always sufficiently sensitive to detect subtle changes in transcription.Here we present a high-throughput method to systematically analyze the effect in a single experiment of mutations at every position in a core promoter (Fig. 1a). Mutant promoters are synthesized in parallel as DNA oligonucleotides on a programmable microarray and released into solution 10 , resulting in a complex library. Each oligonucleotide in the library is designed to include a unique barcode sequence downstream of the promoter's transcription start site (TSS). The oligos are transcribed in vitro, and the resulting transcripts are sequenced. The relative abundance of each programmed barcode provides a digital readout of the transcriptional efficiency of its cis-linked mutant promoter.As a proof of concept, this method was applied to three well-characterized bacteriophage promoters: T3 (class 3, phi13), T7 (class 3, phi10) and SP6 (SP6p32). We focused on a 35-nt region, spanning 23-nt upstream and 12-nt downstream of each promoter's TSS (Fig. 1b). At each position, we mutated the native nucleotide to every other nucleotide or introduced a singlenucleotide deletion. We also included several double mutation promoters, allowing us to compare the single mutants to their combination. To guard against the potential influence of the barcode itself on transcriptional activity, we represented each mutant variant of each native Correspondence should be addressed to J.S. (shendure@u.washington.edu) or R.P.P. (rpatward@u.washington.edu). Tables 1 and 2). NIH Public AccessThe promoter library was transcribed in vitro with one of three RNA polymerases (T7, T3 or SP6). The resulting RNA pools were reverse transcribed, PCR amplified and sequenced on an Illumina GAII system. Reads were then mapped back to the 20-nt barcodes that we had programmed in cis with each synthetic promoter. To control for potentially non-uniform represe...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.