Automated visitor and wildlife monitoring with camera traps and machine learning

Mitterwallner, Veronika; Peters, Anne; Edelhoff, Hendrik; Mathes, Gregor; Nguyen, Hien; Peters, Wibke; Heurich, Marco; Steinbauer, Manuel J.

doi:10.1002/rse2.367

Cited by 11 publications

(5 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Secondly, we argue that the integration of machine learning predictions directly into subsequent ecological tasks can be facilitated by achieving calibration. Many ecological downstream tasks (e.g estimating occupancy, abundance or activity patterns) based on deep learning predictions use an arbitrary threshold selection (Lonsinger et al 2023; Krivek et al 2023) to consider that a prediction is correct, or test a series of thresholds to determine the optimal one given known species labels (Whytock, SŚwieżewski, et al 2021; Mitterwallner et al 2023). However, the ultimate goal of using AI is to avoid having to label images.…”

Section: Discussionmentioning

confidence: 99%

“…In this paper, we explore the calibration of confidence scores in the context of species classification models for camera trap data. For that task, the recurring leading approach, as assessed in recent iWildcam competitions (Beery, Agarwal, et al 2021), consists in two steps: (step 1) detecting animals, humans and vehicles and filtering out empty im-ages using a robust detection model such as MegaDetector (Beery, Morris, et al 2019;Mitterwallner et al 2023) and (step 2) using a convolutional neural network (CNN) classification model to identify the species in the bounding box returned by the detection model, when an animal has been detected. We therefore focus on these species classification models (step 2), which are developed for a large range of species all over the world.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Being confident in confidence scores: calibration in deep learning models for camera trap image sequences

Dussert,

Chamaillé-Jammes,

Dray

et al. 2023

Preprint

View full text Add to dashboard Cite

In this paper, we investigate whether deep learning models for species classification in camera trap images are well calibrated, i.e. whether predicted confidence scores can be reliably interpreted as probabilities that the predictions are true. Additionally, as camera traps are often configured to take multiple photos of the same event, we also explore the calibration of predictions at the sequence level.Here, we (i) train deep learning models on a large and diverse European camera trap dataset, using five established architectures; (ii) compare their calibration and classification performances on three independent test sets; (iii) measure the performances at sequence level using four approaches to aggregate individuals predictions; (iv) study the effect and the practicality of a post-hoc calibration method, for both image and sequence levels.Our results first suggest that calibration and accuracy are closely intertwined and vary greatly across model architectures. Secondly, we observe that averaging the logits over the sequence before applying softmax normalization emerges as the most effective method for achieving both good calibration and accuracy at the sequence level. Finally, temperature scaling can be a practical solution to further improve calibration, given the generalizability of the optimum temperature across datasets.We conclude that, with adequate methodology, deep learning models for species classification can be very well calibrated. This considerably improves the interpretability of the confidence scores and their usability in ecological downstream tasks.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Being confident in confidence scores: calibration in deep learning models for camera trap image sequences

Dussert,

Chamaillé-Jammes,

Dray

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Over the last few decades, camera traps have been adopted for various ecological tasks, including abundance estimation [5][6][7], the quantification of species diversity [8], the detection of rare species [9], the investigation of animal activity patterns [10], and the analysis of species replacement processes [11]. Automatic analysis using artificial intelligence is absolutely necessary to deal with the vast amount of collected camera trap data [12,13]. Currently, many AI models are created that enable the management and processing of camera trap images and videos, facilitate the categorization of camera trap images, or are able to classify species within these images [14].…”

Section: Introductionmentioning

confidence: 99%

Action Detection for Wildlife Monitoring with Camera Traps Based on Segmentation with Filtering of Tracklets (SWIFT) and Mask-Guided Action Recognition (MAROON)

Schindler,

Steinhage,

van Beeck Calkoen

et al. 2024

Applied Sciences

Self Cite

View full text Add to dashboard Cite

Behavioral analysis of animals in the wild plays an important role for ecological research and conservation and has been mostly performed by researchers. We introduce an action detection approach that automates this process by detecting animals and performing action recognition on the detected animals in camera trap videos. Our action detection approach is based on SWIFT (segmentation with filtering of tracklets), which we have already shown to successfully detect and track animals in wildlife videos, and MAROON (mask-guided action recognition), an action recognition network that we are introducing here. The basic ideas of MAROON are the exploitation of the instance masks detected by SWIFT and a triple-stream network. The instance masks enable more accurate action recognition, especially if multiple animals appear in a video at the same time. The triple-stream approach extracts features for the motion and appearance of the animal. We evaluate the quality of our action recognition on two self-generated datasets, from an animal enclosure and from the wild. These datasets contain videos of red deer, fallow deer and roe deer, recorded both during the day and night. MAROON improves the action recognition accuracy compared to other state-of-the-art approaches by an average of 10 percentage points on all analyzed datasets and achieves an accuracy of 69.16% on the Rolandseck Daylight dataset, in which 11 different action classes occur. Our action detection system makes it possible todrasticallyreduce the manual work of ecologists and at the same time gain new insights through standardized results.

show abstract

“…Many machine learning models are now able to reliably predict the species identity of animals seen in camera trap images [12, 17, 25], an information that can be used in several downstream tasks. For example Whytock et al .…”

Section: Introductionmentioning

confidence: 99%

“…Deriving activity patterns from camera-traps images first requires identifying the species present in the pictures (Whytock et al 2021). This task can now often be fully automatized, as recent machine learning models achieve very high performance in species classification (Rigoudy et al 2023;Willi et al 2019;Mitterwallner et al 2023). Then, activity level is generally assumed to be proportional to the number of sightings.…”

Section: Introductionmentioning

confidence: 99%

Zero-shot animal behavior classification with vision-language foundation models

Dussert,

Miele,

Van Reeth

et al. 2024

Preprint

View full text Add to dashboard Cite

In this paper, we investigate whether some recent image-text foundation models are able to perform classification of animal behavior without any fine-tuning. We experiment zero-shot approaches with two types of models: image-text contrastive learning such as CLIP and multimodal LLMs such as CogVLM. Using a new large dataset of European fauna, we demonstrate that some of these models are already very good at predicting behavior, allowing the estimation of behavior-specific activity patterns almost identical to those derived by participatory science annotations.

show abstract

Automated visitor and wildlife monitoring with camera traps and machine learning

Cited by 11 publications

References 59 publications

Being confident in confidence scores: calibration in deep learning models for camera trap image sequences

Being confident in confidence scores: calibration in deep learning models for camera trap image sequences

Action Detection for Wildlife Monitoring with Camera Traps Based on Segmentation with Filtering of Tracklets (SWIFT) and Mask-Guided Action Recognition (MAROON)

Zero-shot animal behavior classification with vision-language foundation models

Contact Info

Product

Resources

About