Species occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic filtering is a scalable and reproducible means to identify potentially problematic records and tailor datasets from public databases such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org), for biodiversity analyses. However, it is unclear how much data may be lost by filtering, whether the same filters should be applied across all taxonomic groups, and what the effect of filtering is on common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa, including terrestrial and marine animals, fungi, and plants downloaded from GBIF. We find that a total of 44.3% of the records are potentially problematic, with large variation across taxonomic groups (25–90%). A small fraction of records was identified as erroneous in the strict sense (4.2%), and a much larger proportion as unfit for most downstream analyses (41.7%). Filters of duplicated information, collection year, and basis of record, as well as coordinates in urban areas, or for terrestrial taxa in the sea or marine taxa on land, have the greatest effect. Automated filtering can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough recording and exploration of the meta-data associated with species records for biodiversity research.
The genus Alcantarea comprises near 30 species endemic to rocky outcrops from eastern Brazil. Most species are ornamental and several are threatened due to habitat loss and over collection. In this paper we examine the phylogenetics of Alcantarea and its relationship with the Brazilian members of Vriesea, a genus of which Alcantarea has been treated as a subgenus. We discuss the morphological evolution of the stamen position and its implication for pollination and the occurrence of Alcantarea in the Espinhaço mountain range rocky savanna-like habitat vegetation. DNA sequence data derived from two plastid markers (trnK-rps16, trnC-petN) and from a low copy nuclear gene (Floricaula/Leafy) together with 20 nuclear microsatellite loci were the data source to perform analyses and construct phylogenetic and Neighbor Joining trees for the genus. Alcantarea is well supported as monophyletic in both Bayesian and parsimony analyses, but sections of Vriesea, represented by the eastern Brazilian species, appear paraphyletic. Microsatellites delimit geographically isolated species groups. Nevertheless individuals belonging to a single species may appear related to distinct clusters of species, suggesting that hybridization and/or homoplasy and/or incomplete lineage sorting are also influencing the analysis based on such markers and may be the reasons for some unexpected results. Alcantarea brasiliana is hypothesized as putative hybrid between A. imperialis and A. geniculata. Spreading stamens, a morphological floral characteristic assumed to be related to Chiropterophily, apparently evolved multiple times within the genus, and invasion of rocky savanna-like habitat vegetation by Atlantic rainforest ancestors seems to have occurred multiple times as well.
Aim Pilosocereus is one of the richest and most widespread genera of columnar cacti, extending from south‐west USA to southern Brazil. Most species occur in the seasonally dry tropical forest biome but can also be found in xeric microhabitats inside woody savannas (Cerrado) and moist forests (Brazilian Atlantic forest). The genus exhibits a highly disjunct distribution across the Neotropics. Using a 90% complete species‐level phylogeny, we reconstructed the spatio‐temporal evolution of Pilosocereus to explore the historical factors behind the species richness of Neotropical dry formations. Location South America, Mesoamerica, Caribbean, south‐western North America. Taxon Genus Pilosocereus (Cactaceae, Cactoideae, Cereeae). Methods We used plastid and nuclear DNA sequences and Bayesian inference to estimate phylogenetic relationships and lineage divergence times. Ancestral ranges were inferred within the Pilosocereus subgenus Pilosocereus s. s. clade using the Dispersal–Extinction–Cladogenesis model in a Bayesian framework to account for parameter estimation uncertainty and the effect of geographical distance on dispersal rates. Results Pilosocereus was recovered as polyphyletic, with representatives of other Cereeae nested within. The Pilosocereus subgenus Pilosocereus s. s. clade originated around the Pliocene–Pleistocene transition (2.7 Ma), probably within the Caatinga seasonally dry tropical forest (SDTF) formation. Species divergences were dated in the Middle and Upper Pleistocene, often constrained to the same geographic region but also associated to migration events to other xeric habitats in Mesoamerica and northern South America; dispersal rates were not dependent on distance. Main conclusions Diversification dynamics in the Pilosocereus subgenus Pilosocereus s. s. clade agree with other infrageneric studies in cacti. Species divergence was rapid, driven by in situ diversification and migration events between SDTF dry formations and xeric microhabitats within other biomes and probably linked to Pleistocene climatic changes. This dynamic history differs from that found in woody SDTF lineages, which are older in age and characterized by low‐dispersal rates and long‐term isolation.
28Species occurrence records provide the basis for many biodiversity studies. They derive from geo-referenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic flagging and filtering are a scalable and reproducible means to identify potentially problematic records in datasets from public databases such as the Global Biodiversity Information Facility (GBIF; www.gbif.org). However, it is unclear how much data may be lost by filtering, whether the same tests should be applied across all taxonomic groups, and what is the effect of filtering for common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa including animals, fungi, and plants, terrestrial and marine, downloaded from GBIF. We find that 29-90% of the records are potentially erroneous, with large variation across taxonomic groups. Tests for duplicated information, collection year, basis of record as well as urban areas and coordinates for terrestrial taxa in the sea or marine taxa on land have the greatest effect. While many flagged records might not be de facto erroneous, they could be overly imprecise and increase uncertainty in downstream analyses. Automated flagging can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough exploration of the meta-data associated with species records for biodiversity research. 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44Publicly available species distribution data have become a crucial resource in biodiversity research, including studies in 46 ecology, biogeography, systematics and conservation biology. In particular, the availability of digitized collections from 47 museums and herbaria, and citizen science observations has increased drastically over the last few years. As of today, 48 the largest public aggregator for geo-referenced species occurrences data, the Global Biodiversity Information Facility 49 (www.gbif.org), provides access to more than 1.3 billion geo-referenced occurrence records for species from across the 50 globe and the tree of life. 51A central challenge to the use of these publicly available species occurrence data in research are erroneous geographic 52 coordinates (Anderson et al. 2016). Errors mostly arise because public databases integrate records collected with 53 different methodologies in different places, at different times; often without centralized curation and only rudimentary 54 meta-data. For instance, erroneous coordinates caused by data-entry errors or automated geo-referencing from vague 55 locality descriptions are common (Maldonado et al. 2015; Yesson et al. 2007)...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.