The tree of life is the fundamental biological roadmap for navigating the evolution and properties of life on Earth, and yet remains largely unknown. Even angiosperms (flowering plants) are fraught with data gaps, despite their critical role in sustaining terrestrial life. Today, high-throughput sequencing promises to significantly deepen our understanding of evolutionary relationships. Here, we describe a comprehensive phylogenomic platform for exploring the angiosperm tree of life, comprising a set of open tools and data based on the 353 nuclear genes targeted by the universal Angiosperms353 sequence capture probes. The primary goals of this paper are to (i) document our methods, (ii) describe our first data release and (iii) present a novel open data portal, the Kew Tree of Life Explorer (https://treeoflife.kew.org ). We aim to generate novel target sequence capture data for all genera of flowering plants, exploiting natural history collections such as herbarium specimens, and augment it with mined public data. Our first data release, described here, is the most extensive nuclear phylogenomic dataset for angiosperms to date, comprising 3,099 samples validated by DNA barcode and phylogenetic tests, representing all 64 orders, 404 families (96%) and 2,333 genera (17%). A “first pass” angiosperm tree of life was inferred from the data, which totalled 824,878 sequences, 489,086,049 base pairs, and 532,260 alignment columns, for interactive presentation in the Kew Tree of Life Explorer. This species tree was generated using methods that were rigorous, yet tractable at our scale of operation. Despite limitations pertaining to taxon and gene sampling, gene recovery, models of sequence evolution and paralogy, the tree strongly supports existing taxonomy, while challenging numerous hypothesized relationships among orders and placing many genera for the first time. The validated dataset, species tree and all intermediates are openly accessible via the Kew Tree of Life Explorer and will be updated as further data become available. This major milestone towards a complete tree of life for all flowering plant species opens doors to a highly integrated future for angiosperm phylogenomics through the systematic sequencing of standardised nuclear markers. Our approach has the potential to serve as a much-needed bridge between the growing movement to sequence the genomes of all life on Earth and the vast phylogenomic potential of the world’s natural history collections.
The world’s herbaria collectively house millions of diverse plant specimens, including endangered or extinct species and type specimens. Unlocking genetic data from the typically highly degraded DNA obtained from herbarium specimens was difficult until the arrival of high-throughput sequencing approaches, which can be applied to low quantities of severely fragmented DNA. Target enrichment involves using short molecular probes that hybridise and capture genomic regions of interest for high-throughput sequencing. In this study on herbariomics, we used this targeted sequencing approach and the Angiosperms353 universal probe set to recover up to 351 nuclear genes from 435 herbarium specimens that are up to 204 years old and span the breadth of angiosperm diversity. We show that on average 207 genes were successfully retrieved from herbarium specimens, although the mean number of genes retrieved and target enrichment efficiency is significantly higher for silica gel-dried specimens. Forty-seven target nuclear genes were recovered from a herbarium specimen of the critically endangered St Helena boxwood, Mellissia begoniifolia, collected in 1815. Herbarium specimens yield significantly less high-molecular-weight DNA than silica gel-dried specimens, and genomic DNA quality declines with sample age, which is negatively correlated with target enrichment efficiency. Climate, taxon-specific traits, and collection strategies additionally impact target sequence recovery. We also detected taxonomic bias in targeted sequencing outcomes for the 10 most numerous angiosperm families that were investigated in depth. We recommend that (1) for species distributed in wet tropical climates, silica gel-dried specimens should be used preferentially; (2) for species distributed in seasonally dry tropical climates, herbarium and silica gel-dried specimens yield similar results, and either collection can be used; (3) taxon-specific traits should be explored and established for effective optimisation of taxon-specific studies using herbarium specimens; (4) all herbarium sheets should, in future, be annotated with details of the preservation method used; (5) long-term storage of herbarium specimens should be in stable, low-humidity, and low-temperature environments; and (6) targeted sequencing with universal probes, such as Angiosperms353, should be investigated closely as a new approach for DNA barcoding that will ensure better exploitation of herbarium specimens than traditional Sanger sequencing approaches.
be merged for downstream analyses. Moreover, our study contributes to the growing consensus that targeted sequencing data are a powerful tool in resolving rapid radiations.
Anemopaegma species have the largest genomes within the Lamiales possibly due to the large amount of repetitive sequences and IR expansion. Variation was higher in coding regions than in noncoding regions, and some genes were identified as markers for differentiation between species. The use of the entire chloroplast genome gave better phylogenetic resolution of the taxonomic groups. We found that two forms of A. acutifolium result from different maternal lineages.
The inference of evolutionary relationships in the species-rich family Orchidaceae has hitherto relied heavily on plastid DNA sequences and limited taxon sampling. Previous studies have provided a robust plastid phylogenetic framework, which was used to classify orchids and investigate the drivers of orchid diversification. However, the extent to which phylogenetic inference based on the plastid genome is congruent with the nuclear genome has been only poorly assessed. METHODS:We inferred higher-level phylogenetic relationships of orchids based on likelihood and ASTRAL analyses of 294 low-copy nuclear genes sequenced using the Angiosperms353 universal probe set for 75 species (representing 69 genera, 16 tribes, 24 subtribes) and a concatenated analysis of 78 plastid genes for 264 species (117 genera, 18 tribes, 28 subtribes). We compared phylogenetic informativeness and support for the nuclear and plastid phylogenetic hypotheses.RESULTS: Phylogenetic inference using nuclear data sets provides well-supported orchid relationships that are highly congruent between analyses. Comparisons of nuclear gene trees and a plastid supermatrix tree showed that the trees are mostly congruent, but revealed instances of strongly supported phylogenetic incongruence in both shallow and deep time. The phylogenetic informativeness of individual Angiosperms353 genes is in general better than that of most plastid genes. CONCLUSIONS:Our study provides the first robust nuclear phylogenomic framework for Orchidaceae and an assessment of intragenomic nuclear discordance, plastid-nuclear tree incongruence, and phylogenetic informativeness across the family. Our results also demonstrate what has long been known but rarely thoroughly documented: nuclear and plastid phylogenetic trees can contain strongly supported discordances, and this incongruence must be reconciled prior to interpretation in evolutionary studies, such as taxonomy, biogeography, and character evolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.