Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata, DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15,000–20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%), but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or 3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-taxonomy. Because long-term—ideally perpetual—data storage is of particular importance for taxonomy, energy footprint reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data, including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach—linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-centered concept and quantitative challenges to host and connect an estimated $ \le $2 million images produced per year by alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000–40,000 taxonomists globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections of cyberspecimens. [Big data; cyberspecimen; new species; omics; repositories; specimen identifier; taxonomy; taxonomic data.]
Background Pallenopsis patagonica (Hoek, 1881) is a morphologically and genetically variable sea spider species whose taxonomic classification is challenging. Currently, it is considered as a species complex including several genetic lineages, many of which have not been formally described as species. Members of this species complex occur on the Patagonian and Antarctic continental shelves as well as around sub-Antarctic islands. These habitats have been strongly influenced by historical large-scale glaciations and previous studies suggested that communities were limited to very few refugia during glacial maxima. Therefore, allopatric speciation in these independent refugia is regarded as a common mechanism leading to high biodiversity of marine benthic taxa in the high-latitude Southern Hemisphere. However, other mechanisms such as ecological speciation have rarely been considered or tested. Therefore, we conducted an integrative morphological and genetic study on the P. patagonica species complex to i) resolve species diversity using a target hybrid enrichment approach to obtain multiple genomic markers, ii) find morphological characters and analyze morphometric measurements to distinguish species, and iii) investigate the speciation processes that led to multiple lineages within the species complex. Results Phylogenomic results support most of the previously reported lineages within the P. patagonica species complex and morphological data show that several lineages are distinct species with diagnostic characters. Two lineages are proposed as new species, P. aulaeturcarum sp. nov. Dömel & Melzer, 2019 and P. obstaculumsuperavit sp. nov. Dömel, 2019, respectively. However, not all lineages could be distinguished morphologically and thus likely represent cryptic species that can only be identified with genetic tools. Further, morphometric data of 135 measurements showed a high amount of variability within and between species without clear support of adaptive divergence in sympatry. Conclusions We generated an unprecedented molecular data set for members of the P. patagonica sea spider species complex with a target hybrid enrichment approach, which we combined with extensive morphological and morphometric analyses to investigate the taxonomy, phylogeny and biogeography of this group. The extensive data set enabled us to delineate species boundaries, on the basis of which we formally described two new species. No consistent evidence for positive selection was found, rendering speciation in allopatric glacial refugia as the most likely model of speciation. Electronic supplementary material The online version of this article (10.1186/s12983-019-0316-y) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.