The use of statistical models to predict the likely occurrence or distribution of species is becoming an increasingly important tool in conservation planning and wildlife management. Evaluating the predictive performance of models using independent data is a vital step in model development. Such evaluation assists in determining the suitability of a model for specific applications, facilitates comparative assessment of competing models and modelling techniques, and identifies aspects of a model most in need of improvement. The predictive performance of habitat models developed using logistic regression needs to be evaluated in terms of two components: reliability or calibration (the agreement between predicted probabilities of occurrence and observed proportions of sites occupied), and discrimination capacity (the ability of a model to correctly distinguish between occupied and unoccupied sites). Lack of reliability can be attributed to two systematic sources, calibration bias and spread. Techniques are described for evaluating both of these sources of error. The discrimination capacity of logistic regression models is often measured by cross-classifying observations and predictions in a two-by-two table, and calculating indices of classification performance. However, this approach relies on the essentially arbitrary choice of a threshold probability to determine whether or not a site is predicted to be occupied. An alternative approach is described which measures discrimination capacity in terms of the area under a relative operating characteristic (ROC) curve relating relative proportions of correctly and incorrectly classified predictions over a wide and continuous range of threshold levels. Wider application of the techniques promoted in this paper could greatly improve understanding of the usefulness, and potential limitations, of habitat models developed for use in conservation planning and wildlife management.
Summary1. Presence-only data, for which there is no information on locations where the species is absent, are common in both animal and plant studies. In many situations, these may be the only data available on a species. We need effective ways to use these data to explore species distribution or species use of habitat. 2. Many analytical approaches have been used to model presence-only data, some inappropriately. We provide a synthesis and critique of statistical methods currently in use to both estimate and evaluate these models, and discuss the critical importance of study design in models where only presence can be identified 3. Profile or envelope methods exist to characterize environmental covariates that describe the locations where organisms are found. Predictions from profile approaches are generally coarse, but may be useful when species records, environmental predictors and biological understanding are scarce. 4. Alternatively, one can build models to contrast environmental attributes associated with known locations with a sample of random landscape locations, termed either 'pseudo-absences' or 'available'. Great care needs to be taken when selecting random landscape locations, because the way in which they are selected determines the modelling techniques that can be applied. 5. Regression-based models can provide predictions of the relative likelihood of occurrence, and in some situations predictions of the probability of occurrence. The logistic model is frequently applied, but can rarely be used directly to estimate these models; instead, case-control or logistic discrimination should be used depending on the sample design. 6. Cross-validation can be used to evaluate model performance and to assess how effectively the model reflects a quantity proportional to the probability of occurrence. However, more research is needed to develop a single measure or statistic that summarizes model performance for presence-only data. 7. Synthesis and applications. A number of statistical procedures are available to explore patterns in presence-only data; the choice among them depends on the quality of the presence-only data. Presence-only records can provide insight into the vulnerability, historical distribution and conservation status of species. Models developed using these data can inform management. Our caveat is that researchers must be mindful of study design and the biases inherent in presence data, and be cautious in the interpretation of model predictions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.