Species distribution modelling (SDM) typically analyses species' presence together with some form of absence information. Ideally absences comprise observations or are inferred from comprehensive sampling. When such information is not available, then pseudo-absences are often generated from the background locations within the study region of interest containing the presences, or else absence is implied through the comparison of presences to the whole study region, e.g. as is the case in Maximum Entropy (MaxEnt) or Poisson point process modelling. However, the choice of which absence information to include can be both challenging and highly influential on SDM predictions (e.g. Oksanen and Minchin, 2002). In practice, the use of pseudo-or implied absences often leads to an imbalance where absences far outnumber presences. This leaves analysis highly susceptible to 'naughty-noughts': absences that occur beyond the envelope of the species, which can exert strong influence on the model and its predictions (Austin and Meyers, 1996). Also known as 'excess zeros', naughty noughts can be estimated via an overall proportion in simple hurdle or mixture models (Martin et al., 2005). However, absences, especially those that occur beyond the species envelope, can often be more diverse than presences. Here we consider an extension to excess zero models. The two-staged approach first exploits the compartmentalisation provided by classification trees (CTs) (as in O'Leary, 2008) to identify multiple sources of naughty noughts and simultaneously delineate several species envelopes. Then SDMs can be fit separately within each envelope, and for this stage, we examine both CTs (as in Falk et al., 2014) and the popular MaxEnt (Elith et al., 2006). We introduce a wider range of model performance measures to improve treatment of naughty noughts in SDM. We retain an overall measure of model performance, the area under the curve (AUC) of the Receiver-Operating Curve (ROC), but focus on its constituent measures of false negative rate (FNR) and false positive rate (FPR), and how these relate to the threshold in the predicted probability of presence that delimits predicted presence from absence. We also propose error rates more relevant to users of predictions: false omission rate (FOR), the chance that a predicted absence corresponds to (and hence wastes) an observed presence, and the false discovery rate (FDR), reflecting those predicted (or potential) presences that correspond to absence. A high FDR may be desirable since it could help target future search efforts, whereas zero or low FOR is desirable since it indicates none of the (often valuable) presences have been ignored in the SDM. For illustration, we chose Bradypus variegatus, a species that has previously been published as an exemplar species for MaxEnt, proposed by Phillips et al. (2006). We used CTs to increasingly refine the species envelope, starting with the whole study region (E0), eliminating more and more potential naughty noughts (E1-E3). When combined with an SDM fit wit...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.