Brassicaceae is an important family at both the agronomic and scientific level. The family not only includes several model species, but it is also becoming an evolutionary model at the family level. However, resolving the phylogenetic relationships within the family has been problematic, and a large-scale molecular phylogeny in terms of generic sampling and number of genes is still lacking. In particular, the deeper relationships within the family, for example between the three major recognized lineages, prove particularly hard to resolve. Using a slow-evolving mitochondrial marker (nad4 intron 1), we reconstructed a comprehensive phylogeny in generic representation for the family. In addition, and because resolution was very low in previous single marker phylogenies, we adopted a supermatrix approach by concatenating all checked and reliable sequences available on GenBank as well as new sequences for a total 207 currently recognized genera and eight molecular markers representing a comprehensive coverage of all three genomes. The supermatrix was dated under an uncorrelated relaxed molecular clock using a direct fossil calibration approach. Finally, a lineage-through-time-plot and rates of diversification for the family were generated. The resulting tree, the largest in number of genera and markers sampled to date and covering the whole family in a representative way, provides important insights into the evolution of the family on a broad scale. The backbone of the tree remained largely unresolved and is interpreted as the consequence of early rapid radiation within the family. The age of the family was inferred to be 37.6 (24.2-49.4) Ma, which largely agrees with previous studies. The ages of all major lineages and tribes are also reported. Analysis of diversification suggests that Brassicaceae underwent a rapid period of diversification, after the split with the early diverging tribe Aethionemeae. Given the dates found here, the family appears to have originated under a warm and humid climate approximately 37 Ma. We suggest that the rapid radiation detected was caused by a global cooling during the Oligocene coupled with a genome duplication event. This duplication could have allowed the family to rapidly adapt to the changing climate.
SUMMARYWe explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species-and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.
Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22–82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4–97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2–71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well.
Dried plant herbarium specimens are potentially a valuable source of DNA. Efforts to obtain genetic information from this source are often hindered by an inability to obtain amplifiable DNA as herbarium DNA is typically highly degraded. DNA post-mortem damage may not only reduce the number of amplifiable template molecules, but may also lead to the generation of erroneous sequence information. A qualitative and quantitative assessment of DNA post-mortem damage is essential to determine the accuracy of molecular data from herbarium specimens. In this study we present an assessment of DNA damage as miscoding lesions in herbarium specimens using 454-sequencing of amplicons derived from plastid, mitochondrial, and nuclear DNA. In addition, we assess DNA degradation as a result of strand breaks and other types of polymerase non-bypassable damage by quantitative real-time PCR. Comparing four pairs of fresh and herbarium specimens of the same individuals we quantitatively assess post-mortem DNA damage, directly after specimen preparation, as well as after long-term herbarium storage. After specimen preparation we estimate the proportion of gene copy numbers of plastid, mitochondrial, and nuclear DNA to be 2.4–3.8% of fresh control DNA and 1.0–1.3% after long-term herbarium storage, indicating that nearly all DNA damage occurs on specimen preparation. In addition, there is no evidence of preferential degradation of organelle versus nuclear genomes. Increased levels of C→T/G→A transitions were observed in old herbarium plastid DNA, representing 21.8% of observed miscoding lesions. We interpret this type of post-mortem DNA damage-derived modification to have arisen from the hydrolytic deamination of cytosine during long-term herbarium storage. Our results suggest that reliable sequence data can be obtained from herbarium specimens.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.