The digitization of museum collections as well as an explosion in citizen science initiatives has resulted in a wealth of data that can be useful for understanding the global distribution of biodiversity, provided that the well‐documented biases inherent in unstructured opportunistic data are accounted for. While traditionally used to model imperfect detection using structured data from systematic surveys of wildlife, occupancy models provide a framework for modelling the imperfect collection process that results in digital specimen data. In this study, we explore methods for adapting occupancy models for use with biased opportunistic occurrence data from museum specimens and citizen science platforms using seven species of Anacardiaceae in Florida as a case study. We explored two methods of incorporating information about collection effort to inform our uncertainty around species presence: 1) filtering the data to exclude collectors unlikely to collect the focal species and 2) incorporating collection covariates (collection type, time of collection and history of previous detections) into a model of collection probability. We found that the best models incorporated both the background data filtration step as well as collector covariates. Month, method of collection and whether a collector had previously collected the focal species were important predictors of collection probability. Efforts to standardize meta‐data associated with data collection will improve efforts for modeling the spatial distribution of a variety of species.
Species distribution models are useful for estimating the distribution and environmental preferences of rare species, but these same species are challenging to model on account of sparse data. We contrast a traditional single-species approach (generalized linear models, GLMs) with two promising frameworks for modeling rare species: ensembles of small models (ESMs), which average across simple models; and multispecies distribution models (MSDMs), which allow rarer species to benefit from statistical 'borrowing of strength' from more common species. Using a virtual species within a community of real species, we evaluated how model accuracy was influenced by the number of occurrences of the rare species (N = 2-64), niche breadth, and similarity to more numerous species' niches. For discriminating between presence and absence, ESMs with just linear terms (ESM-L) performed best for N ≤ 4, whereas for GLMs and ESMs with polynomial terms (ESM-P) were best for N ≥ 8. For calibrating the species' response to influential variables, the MSDM hierarchical modeling of species communities (HMSC) and ESM-P were best for species with niches similar to those of other species. For species with dissimilar niches, ESM-P did best for N ≥ 8, but no model was well calibrated for smaller sample sizes. For identifying uninfluential variables, ESM-L and species archetype models (SAMs), a type of MSDM, did well for ≤ 4, and ESM-L for N ≥ 8. Models of species with narrow niches dissimilar to others had the highest discrimination capacity compared to models for generalist species and/or species with niches similar to other species' niches. 'Borrowing of strength' in MSDMs can assist with some inference tasks, but does not necessarily improve predictions for rare species; simpler, single-species models may be better at a given task. The best algorithm depends on modeling goal (discrimination versus calibration), sample size, and niche breadth and similarity.
Aim Museum and herbarium specimen records are frequently used to assess the conservation status of species and their responses to climate change. Typically, occurrences with imprecise geolocality information are discarded because they cannot be matched confidently to environmental conditions and are thus expected to increase uncertainty in downstream analyses. However, using only precisely georeferenced records risks undersampling of the environmental and geographical distributions of species. We present two related methods to allow the use of imprecisely georeferenced occurrences in biogeographical analysis. Innovation Our two procedures assign imprecise records to the (1) locations or (2) climates that are closest to the geographical or environmental centroid of the precise records of a species. For virtual species, including imprecise records alongside precise records improved the accuracy of ecological niche models projected to the present and the future, especially for species with c. 20 or fewer precise occurrences. Using only precise records underestimated loss of suitable habitat and overestimated the amount of suitable habitat in both the present and the future. Including imprecise records also improves estimates of niche breadth and extent of occurrence. An analysis of 44 species of North American Asclepias (Apocynaceae) yielded similar results. Main conclusions Existing studies examining the effects of spatial imprecision typically compare outcomes based on precise records against the same records with spatial error added to them. However, in real‐world cases, analysts possess a mix of precise and imprecise records and must decide whether to retain or discard the latter. Discarding imprecise records can undersample the geographical and environmental distributions of species and lead to mis‐estimation of responses to past and future climate change. Our method, for which we provide a software implementation in the enmSdmX package for R, is simple to use and can help leverage the large number of specimen records that are typically deemed “unusable” because of spatial imprecision in their geolocation.
Determining the distribution and environmental preferences of rare species threatened by global change has long been a focus of conservation. Typical minimum suggested number of occurrences ranges from ~5 to 30, but many species are represented by even fewer occurrences. However, several newer methods may be able to accommodate such low samples sizes. These include Bayesian joint species distribution models (JSDMs) which allow rare species to statistically "borrow strength" from more common species with similar niches, and ensembles of small models (ESMs), which reduce the number of parameters by averaging smaller models. Here we explore how niche breadth and niche position relative to other species influence model performance at low sample sizes (N=2, 4, 8, 16, 32, 64) using virtual species within a community of real species. ESMs were better at discrimination tasks for most species, and yielded better-than-random accuracy even for N=2. In contrast, "traditional" single species or JSDMs were better able to estimate the underlying response curves of variables that influenced the niche, but at low sample sizes also were more likely to incorrectly identify unimportant factors as influential. Species with niches that were narrow and peripheral to the available environmental space yielded models with better discrimination capacity than species with broad niches or niches that were similar to those of other species, regardless of whether the modeling algorithm allowed for borrowing of strength. Our study suggests that some rare species may be able to be modeled reliably at very low sample sizes, although the best algorithm depends on number of occurrences and whether the niche or distribution is the focus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.