Summary1. Presence-only data are widely used for species distribution modelling, and point process regression models are a flexible tool that has considerable potential for this problem, when data arise as point events. 2. In this paper, we review point process models, some of their advantages and some common methods of fitting them to presence-only data. 3. Advantages include (and are not limited to) clarification of what the response variable is that is modelled; a framework for choosing the number and location of quadrature points (commonly referred to as pseudoabsences or 'background points') objectively; clarity of model assumptions and tools for checking them; models to handle spatial dependence between points when it is present; and ways forward regarding difficult issues such as accounting for sampling bias. 4. Point process models are related to some common approaches to presence-only species distribution modelling, which means that a variety of different software tools can be used to fit these models, including MAXENT or generalised linear modelling software.
A large array of species distribution model (SDM) approaches has been developed for explaining and predicting the occurrences of individual species or species assemblages. Given the wealth of existing models, it is unclear which models perform best for interpolation or extrapolation of existing data sets, particularly when one is concerned with species assemblages. We compared the predictive performance of 33 variants of 15 widely applied and recently emerged SDMs in the context of multispecies data, including both joint SDMs that model multiple species together, and stacked SDMs that model each species individually combining the predictions afterward. We offer a comprehensive evaluation of these SDM approaches by examining their performance in predicting withheld empirical validation data of different sizes representing five different taxonomic groups, and for prediction tasks related to both interpolation and extrapolation. We measure predictive performance by 12 measures of accuracy, discrimination power, calibration, and precision of predictions, for the biological levels of species occurrence, species richness, and community composition. Our results show large variation among the models in their predictive performance, especially for communities comprising many species that are rare. The results do not reveal any major trade‐offs among measures of model performance; the same models performed generally well in terms of accuracy, discrimination, and calibration, and for the biological levels of individual species, species richness, and community composition. In contrast, the models that gave the most precise predictions were not well calibrated, suggesting that poorly performing models can make overconfident predictions. However, none of the models performed well for all prediction tasks. As a general strategy, we therefore propose that researchers fit a small set of models showing complementary performance, and then apply a cross‐validation procedure involving separate data to establish which of these models performs best for the goal of the study.
Modeling the spatial distribution of a species is a fundamental problem in ecology. A number of modeling methods have been developed, an extremely popular one being MAXENT, a maximum entropy modeling approach. In this article, we show that MAXENT is equivalent to a Poisson regression model and hence is related to a Poisson point process model, differing only in the intercept term, which is scale-dependent in MAXENT. We illustrate a number of improvements to MAXENT that follow from these relations. In particular, a point process model approach facilitates methods for choosing the appropriate spatial resolution, assessing model adequacy, and choosing the LASSO penalty parameter, all currently unavailable to MAXENT. The equivalence result represents a significant step in the unification of the species distribution modeling literature.
Presence-only data, where information is available concerning species presence but not species absence, are subject to bias due to observers being more likely to visit and record sightings at some locations than others (hereafter “observer bias”). In this paper, we describe and evaluate a model-based approach to accounting for observer bias directly – by modelling presence locations as a function of known observer bias variables (such as accessibility variables) in addition to environmental variables, then conditioning on a common level of bias to make predictions of species occurrence free of such observer bias. We implement this idea using point process models with a LASSO penalty, a new presence-only method related to maximum entropy modelling, that implicitly addresses the “pseudo-absence problem” of where to locate pseudo-absences (and how many). The proposed method of bias-correction is evaluated using systematically collected presence/absence data for 62 plant species endemic to the Blue Mountains near Sydney, Australia. It is shown that modelling and controlling for observer bias significantly improves the accuracy of predictions made using presence-only data, and usually improves predictions as compared to pseudo-absence or “inventory” methods of bias correction based on absences from non-target species. Future research will consider the potential for improving the proposed bias-correction approach by estimating the observer bias simultaneously across multiple species.
Identification of species' Biologically Important Areas (BIAs) is fundamental to conservation planning and species distribution models (SDMs) are a powerful tool commonly used to do this. Presence-only data are increasingly being used to develop SDMs to aid the conservation decision-making process. The application of presence-only SDMs for marine species' is particularly attractive due to often logistical and economic costs of obtaining systematic species' distribution data. However, robust model validation is important for conservation management applications that require accurate and reliable species' occurrence data (e.g., spatially explicit risk assessments). This is commonly done using a random subset of the data and less commonly with fully independent test data. Here, we apply a spatial block cross-validation (CV) approach to validate a MaxEnt presence-only model using independent presence/absence survey data for a highly mobile, marine species (humpback whale, Megaptera novaengliae) in the Great Barrier Reef (GBR). A MaxEnt model was developed using opportunistic whale sightings (2003-2007) and then used to identify areas differing in habitat suitability (low, medium, high) to conduct a systematic, line-transect, aerial survey (2012) and derive a density surface model. A spatial block CV buffering strategy was used to validate the MaxEnt model, using the opportunistic sightings as training data and independent aerial survey sightings data as test data. Moderate performance measures indicate MaxEnt was reliable in identifying the distribution patterns of a mobile whale species on their breeding ground, indicated by areas of high density aligned to areas of high habitat suitability. Furthermore, we demonstrate that MaxEnt models can be useful and cost-effective for designing a sampling scheme to undertake systematic surveys that significantly reduces sampling effort. In this study, higher quality information on whale reproductive class (calf vs. non-calf groups) was obtained that the presenceonly data lacked, while sampling only 18% of the GBR World Heritage Area. The validation approach using fully independent data provides greater confidence in the MaxEnt model, which indicates significant overlap with the main breeding ground of humpback whales and the inner shipping route. This is important when evaluating presence-only models within certain conservation management applications, such as spatial risk assessments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.