We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 Mb and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than 1/3 of Daphnia’s genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The co-expansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes – including many additional loci within sequenced regions that are otherwise devoid of annotations – are the most responsive genes to ecological challenges.
Knowledge of the rate and nature of spontaneous mutation is fundamental to understanding evolutionary and molecular processes. In this report, we analyze spontaneous mutations accumulated over thousands of generations by wild-type Escherichia coli and a derivative defective in mismatch repair (MMR), the primary pathway for correcting replication errors. The major conclusions are (i) the mutation rate of a wild-type E. coli strain is ∼1 × 10 −3 per genome per generation; (ii) mutations in the wild-type strain have the expected mutational bias for G:C > A:T mutations, but the bias changes to A:T > G:C mutations in the absence of MMR; (iii) during replication, A:T > G:C transitions preferentially occur with A templating the lagging strand and T templating the leading strand, whereas G:C > A:T transitions preferentially occur with C templating the lagging strand and G templating the leading strand; (iv) there is a strong bias for transition mutations to occur at 5′ApC3′/3′TpG5′ sites (where bases 5′A and 3′T are mutated) and, to a lesser extent, at 5′GpC3′/3′CpG5′ sites (where bases 5′G and 3′C are mutated); (v) although the rate of small (≤4 nt) insertions and deletions is high at repeat sequences, these events occur at only 1/10th the genomic rate of base-pair substitutions. MMR activity is genetically regulated, and bacteria isolated from nature often lack MMR capacity, suggesting that modulation of MMR can be adaptive. Thus, comparing results from the wild-type and MMR-defective strains may lead to a deeper understanding of factors that determine mutation rates and spectra, how these factors may differ among organisms, and how they may be shaped by environmental conditions. evolution | mutation accumulation | neutral mutation | mutational hotspots | indels M utations are the source of variation upon which natural selection acts; thus, a complete understanding of evolutionary processes must include an accurate assessment of mutation rates and of the molecular spectrum of mutational events. In addition, we need to know whether, and how, intrinsic and extrinsic factors influence mutational processes. This understanding must be founded on baseline parameters established by analyzing mutations that accumulate in a neutral fashion, unbiased by selective pressures. Much of our knowledge of spontaneous mutation is based on mutations that occur in nonessential reporter genes during short-term laboratory culture of microorganisms (1). An alternative approach is to compare presumably neutral mutations that have accumulated over evolutionary time periods in diverged species (2). Both methods have substantial uncertainties. The experimental approach may use reporter loci that are not representative of the whole genome and necessarily incorporates assumptions about the expression and neutrality of the mutant phenotypes. The historical approach relies on estimated divergence times and the absence of selective pressure on synonymous sequence changes. High-throughput whole-genome sequencing allows some of these limitations to be...
For the last 20 years, fragment assembly in DNA sequencing followed the ''overlap-layout-consensus'' paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical ''overlap-layout-consensus'' approach in favor of a new EULER algorithm that, for the first time, resolves the 20-year-old ''repeat problem'' in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. EULER, in contrast to the CELERA assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.
The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not available. Gene predictors developed for whole genomes (e.g. Glimmer) and recently developed for metagenomic sequences (e.g. MetaGene) show a significant decrease in performance as the sequencing error rates increase, or as reads get shorter. We have developed a novel gene prediction method FragGeneScan, which combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads. The performance of FragGeneScan was comparable to Glimmer and MetaGene for complete genomes. But for short reads, FragGeneScan consistently outperformed MetaGene (accuracy improved ∼62% for reads of 400 bases with 1% sequencing errors, and ∼18% for short reads of 100 bases that are error free). When applied to metagenomes, FragGeneScan recovered substantially more genes than MetaGene predicted (>90% of the genes identified by homology search), and many novel genes with no homologs in current protein sequence database.
Drosophila melanogaster cell lines are important resources for cell biologists. Here, we catalog the expression of exons, genes, and unannotated transcriptional signals for 25 lines. Unannotated transcription is substantial (typically 19% of euchromatic signal). Conservatively, we identify 1405 novel transcribed regions; 684 of these appear to be new exons of neighboring, often distant, genes. Sixty-four percent of genes are expressed detectably in at least one line, but only 21% are detected in all lines. Each cell line expresses, on average, 5885 genes, including a common set of 3109. Expression levels vary over several orders of magnitude. Major signaling pathways are well represented: most differentiation pathways are “off” and survival/growth pathways “on.” Roughly 50% of the genes expressed by each line are not part of the common set, and these show considerable individuality. Thirty-one percent are expressed at a higher level in at least one cell line than in any single developmental stage, suggesting that each line is enriched for genes characteristic of small sets of cells. Most remarkable is that imaginal disc-derived lines can generally be assigned, on the basis of expression, to small territories within developing discs. These mappings reveal unexpected stability of even fine-grained spatial determination. No two cell lines show identical transcription factor expression. We conclude that each line has retained features of an individual founder cell superimposed on a common “cell line“ gene expression pattern.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.