High-throughput DNA sequencing has the potential to accelerate species discovery if it is able to recognize evolutionary entities from sequence data that are comparable to species. The general mixed Yule-coalescent (GMYC) model estimates the species boundary from DNA surveys by identifying independently evolving lineages as a transition from coalescent to speciation branching patterns on a phylogenetic tree. Applied here to 12 families from 4 orders of insects in Madagascar, we used the model to delineate 370 putative species from mitochondrial DNA sequence variation among 1614 individuals. These were compared with data from the nuclear genome and morphological identification and found to be highly congruent (98% and 94%). We developed a modified GMYC that allows for a variable transition from coalescent to speciation among lineages. This revised model increased the congruence with morphology (97%), suggesting that a variable threshold better reflects the clustering of sequence data into biological species. Local endemism was pronounced in all 5 insect groups. Most species (60-91%) and haplotypes (88-99%) were found at only 1 of the 5 study sites (40-1000 km apart). This pronounced endemism resulted in a 37% increase in species numbers using diagnostic nucleotides in a population aggregation analysis. Sample sizes between 7 and 10 individuals represented a threshold above which there was minimal increase in genetic diversity, broadly agreeing with coalescent theory and other empirical studies. Our results from > 1.4 Mb of empirical data suggest that the GMYC model captures species boundaries comparable to those from traditional methods without the need for prior hypotheses of population coherence. This provides a method of species discovery and biodiversity assessment using single-locus data from mixed or environmental samples while building a globally available taxonomic database for future identifications.
Eight years after DNA barcoding was formally proposed on a large scale, CO1 sequences are rapidly accumulating from around the world. While studies to date have mostly targeted local or regional species assemblages, the recent launch of the global iBOL project (International Barcode of Life), highlights the need to understand the effects of geographical scale on Barcoding's goals. Sampling has been central in the debate on DNA Barcoding, but the effect of the geographical scale of sampling has not yet been thoroughly and explicitly tested with empirical data. Here, we present a CO1 data set of aquatic predaceous diving beetles of the tribe Agabini, sampled throughout Europe, and use it to investigate how the geographic scale of sampling affects 1) the estimated intraspecific variation of species, 2) the genetic distance to the most closely related heterospecific, 3) the ratio of intraspecific and interspecific variation, 4) the frequency of taxonomically recognized species found to be monophyletic, and 5) query identification performance based on 6 different species assignment methods. Intraspecific variation was significantly correlated with the geographical scale of sampling (R-square = 0.7), and more than half of the species with 10 or more sampled individuals (N = 29) showed higher intraspecific variation than 1% sequence divergence. In contrast, the distance to the closest heterospecific showed a significant decrease with increasing geographical scale of sampling. The average genetic distance dropped from > 7% for samples within 1 km, to < 3.5% for samples up to > 6000 km apart. Over a third of the species were not monophyletic, and the proportion increased through locally, nationally, regionally, and continentally restricted subsets of the data. The success of identifying queries decreased with increasing spatial scale of sampling; liberal methods declined from 100% to around 90%, whereas strict methods dropped to below 50% at continental scales. The proportion of query identifications considered uncertain (more than one species < 1% distance from query) escalated from zero at local, to 50% at continental scale. Finally, by resampling the most widely sampled species we show that even if samples are collected to maximize the geographical coverage, up to 70 individuals are required to sample 95% of intraspecific variation. The results show that the geographical scale of sampling has a critical impact on the global application of DNA barcoding. Scale-effects result from the relative importance of different processes determining the composition of regional species assemblages (dispersal and ecological assembly) and global clades (demography, speciation, and extinction). The incorporation of geographical information, where available, will be required to obtain identification rates at global scales equivalent to those in regional barcoding studies. Our result hence provides an impetus for both smarter barcoding tools and sprouting national barcoding initiatives—smaller geographical scales deliver higher a...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.