How to make use of unlabeled observations in species distribution modeling using point process models

Guilbault, Emy; Renner, Ian; Mahony, Michael; Beh, Eric J.

doi:10.1002/ece3.7411

Cited by 4 publications

(2 citation statements)

References 57 publications

(80 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, data integration frameworks have been developed linking multiple data sources via combined likelihood estimation (Fletcher et al, 2016;Farr et al, 2019). Although unified frameworks remain rare, models using thinned point processes, which remove or retain points according to probabilistic rules, showed superior performance of combining unstructured and structured data sources compared to inferences obtained from single data sources (Dorazio, 2014;Fletcher et al, 2016;Koshkina et al, 2017;Guilbault et al, 2021). A recent hierarchical modeling approach by Renner et al (2019) utilizes multiple data sources while accounting for overfitting and spatial dependence of observations via combined likelihood maximization.…”

Section: Introductionmentioning

confidence: 99%

Data-integration of opportunistic species observations into hierarchical modeling frameworks improves spatial predictions for urban red squirrels

et al. 2022

View full text Add to dashboard Cite

The prevailing trend of increasing urbanization and habitat fragmentation makes knowledge of species’ habitat requirements and distribution a crucial factor in conservation and urban planning. Species distribution models (SDMs) offer powerful toolboxes for discriminating the underlying environmental factors driving habitat suitability. Nevertheless, challenges in SDMs emerge if multiple data sets - often sampled with different intention and therefore sampling scheme – can complement each other and increase predictive accuracy. Here, we investigate the potential of using recent data integration techniques to model potential habitat and movement corridors for Eurasian red squirrels (Sciurus vulgaris), in an urban area. We constructed hierarchical models integrating data sets of different quality stemming from unstructured on one side and semi-structured wildlife observation campaigns on the other side in a combined likelihood approach and compared the results to modeling techniques based on only one data source - wherein all models were fit with the same selection of environmental variables. Our study highlights the increasing importance of considering multiple data sets for SDMs to enhance their predictive performance. We finally used Circuitscape (version 4.0.5) on the most robust SDM to delineate suitable movement corridors for red squirrels as a basis for planning road mortality mitigation measures. Our results indicate that even though red squirrels are common, urban habitats are rather small and partially lack connectivity along natural connectivity corridors in Berlin. Thus, additional fragmentation could bring the species closer to its limit to persist in urban environments, where our results can act as a template for conservation and management implications.

show abstract

Section: Introductionmentioning

confidence: 99%

Data-integration of opportunistic species observations into hierarchical modeling frameworks improves spatial predictions for urban red squirrels

et al. 2022

View full text Add to dashboard Cite

show abstract

“…The above‐mentioned studies have either used verified data collected on the site level (where the occupancy state of a species is known at a site and not at the individual sample level; Chambert, Waddle, et al., 2018 ), on aggregated individual sample level using a multinomial model with site‐covariates (Wright et al., 2020 ) or on individual sample‐level validation data which helps in modelling non‐species identities (morphospecies) to species identities (Spiers et al., 2022 ). It is also worth stating that some studies have explored accounting for misclassification in abundance (Conn et al., 2013 ), capture–recapture (Augustine et al., 2020 ) and mixture (Guilbault et al., 2021 ) models.…”

Section: Introductionmentioning

confidence: 99%

Modelling heterogeneity in the classification process in multi‐species distribution models can improve predictive performance

Adjei,

Finstad,

Koch

et al. 2024

Ecology and Evolution

View full text Add to dashboard Cite

Species distribution models and maps from large‐scale biodiversity data are necessary for conservation management. One current issue is that biodiversity data are prone to taxonomic misclassifications. Methods to account for these misclassifications in multi‐species distribution models have assumed that the classification probabilities are constant throughout the study. In reality, classification probabilities are likely to vary with several covariates. Failure to account for such heterogeneity can lead to biased prediction of species distributions. Here, we present a general multi‐species distribution model that accounts for heterogeneity in the classification process. The proposed model assumes a multinomial generalised linear model for the classification confusion matrix. We compare the performance of the heterogeneous classification model to that of the homogeneous classification model by assessing how well they estimate the parameters in the model and their predictive performance on hold‐out samples. We applied the model to gull data from Norway, Denmark and Finland, obtained from the Global Biodiversity Information Facility. Our simulation study showed that accounting for heterogeneity in the classification process increased the precision of true species' identity predictions by 30% and accuracy and recall by 6%. Since all the models in this study accounted for misclassification of some sort, there was no significant effect of accounting for heterogeneity in the classification process on the inference about the ecological process. Applying the model framework to the gull dataset did not improve the predictive performance between the homogeneous and heterogeneous models (with parametric distributions) due to the smaller misclassified sample sizes. However, when machine learning predictive scores were used as weights to inform the species distribution models about the classification process, the precision increased by 70%. We recommend multiple multinomial regression to be used to model the variation in the classification process when the data contains relatively larger misclassified samples. Machine learning prediction scores should be used when the data contains relatively smaller misclassified samples.

show abstract

A practical approach to making use of uncertain species presence-only data in ecology: Reclassification, regularization methods and observer bias

Guilbault

Renner

Beh

et al. 2023

Ecological Informatics

View full text Add to dashboard Cite

How to make use of unlabeled observations in species distribution modeling using point process models

Abstract: Species distribution modeling has been a popular topic in ecological statistics over the past decade. Many tools and methods have been developed to provide a means to explore the distributions of species through mapping of suitable environments (Inoue et al., 2017;

Cited by 4 publications

References 57 publications

Data-integration of opportunistic species observations into hierarchical modeling frameworks improves spatial predictions for urban red squirrels

Data-integration of opportunistic species observations into hierarchical modeling frameworks improves spatial predictions for urban red squirrels

Modelling heterogeneity in the classification process in multi‐species distribution models can improve predictive performance

A practical approach to making use of uncertain species presence-only data in ecology: Reclassification, regularization methods and observer bias

Contact Info

Product

Resources

About