Large segmental duplications cover much of the Arabidopsis thaliana genome. Little is known about their origins. We show that they are primarily due to at least four different large-scale duplication events that occurred 100 to 200 million years ago, a formative period in the diversification of the angiosperms. A better understanding of the complex structural history of angiosperm genomes is necessary to make full use of Arabidopsis as a genetic model for other plant species.
A 105-kilobase bacterial artificial chromosome (BAC) clone from the ovate-containing region of tomato chromosome 2 was sequenced and annotated. The tomato BAC sequence was then compared, gene by gene, with the sequenced portions of the Arabidopsis thaliana genome. Rather than matching a single portion of the Arabidopsis genome, the tomato clone shows conservation of gene content and order with four different segments of Arabidopsis chromosomes 2-5. The gene order and content of these individual Arabidopsis segments indicate that they derived from a common ancestral segment through two or more rounds of large-scale genome duplication eventspossibly polyploidy. One of these duplication events is ancient and may predate the divergence of the Arabidopsis and tomato lineages. The other is more recent and is estimated to have occurred after the divergence of tomato and Arabidopsis Ϸ112 million years ago. Together, these data suggest that, on the scale of BAC-sized segments of DNA, chromosomal rearrangements (e.g., inversions and translocations) have been only a minor factor in the divergence of genome organization among plants. Rather, the dominating factors have been repeated rounds of large-scale genome duplication followed by selective gene loss. We hypothesize that these processes have led to the network of synteny revealed between tomato and Arabidopsis and predict that such networks of synteny will be common when making comparisons among higher plant taxa (e.g., families). genome evolution ͉ polyploidy
Genomics and bioinformatics have great potential to help address numerous topics in ecology and evolution. Expressed sequence tags (ESTs) can bridge genomics and molecular ecology because they can provide a means of accessing the gene space of almost any organism. We review how ESTs have been used in molecular ecology research in the last several years by providing sequence data for the design of molecular markers, genome-wide studies of gene expression and selection, the identification of candidate genes underlying adaptation, and the basis for studies of gene family and genome evolution. Given the tremendous recent advances in inexpensive sequencing technologies, we predict that molecular ecologists will increasingly be developing and using EST collections in the years to come. With this in mind, we close our review by discussing aspects of EST resource development of particular relevance for molecular ecologists.
New genes may arise through tandem duplication, dispersed small-scale duplication, and polyploidy, and patterns of divergence between duplicated genes may vary among these classes. We have examined patterns of gene expression and coding sequence divergence between duplicated genes in Arabidopsis thaliana. Due to the simultaneous origin of polyploidy-derived gene pairs, we can compare covariation in the rates of expression divergence and sequence divergence within this group. Among tandem and dispersed duplicates, much of the divergence in expression profile appears to occur at or shortly after duplication. Contrary to findings from other eukaryotic systems, there is little relationship between expression divergence and synonymous substitutions, whereas there is a strong positive relationship between expression divergence and nonsynonymous substitutions. Because this pattern is pronounced among the polyploidy-derived pairs, we infer that the strength of purifying selection acting on protein sequence and expression pattern is correlated. The polyploidy-derived pairs are somewhat atypical in that they have broader expression patterns and are expressed at higher levels, suggesting differences among polyploidy- and nonpolyploidy-derived duplicates in the types of genes that revert to single copy. Finally, within many of the duplicated pairs, 1 gene is expressed at a higher level across all assayed conditions, which suggests that the subfunctionalization model for duplicate gene preservation provides, at best, only a partial explanation for the patterns of expression divergence between duplicated genes.
BACKGROUND: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation boost”. Furthermore, little is known about patterns in data reuse over time and across datasets. METHOD AND RESULTS: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation boost varied with date of dataset deposition: a citation boost was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. CONCLUSION: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation boost are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.