The ability of a researcher to re‐identify (re‐ID) an individual animal upon re‐encounter is fundamental for addressing a broad range of questions in the study of ecosystem function, community and population dynamics and behavioural ecology. Tagging animals during mark and recapture studies is the most common method for reliable animal re‐ID; however, camera traps are a desirable alternative, requiring less labour, much less intrusion and prolonged and continuous monitoring into an environment. Despite these advantages, the analyses of camera traps and video for re‐ID by humans are criticized for their biases related to human judgement and inconsistencies between analyses. In this review, we describe a brief history of camera traps for re‐ID, present a collection of computer vision feature engineering methodologies previously used for animal re‐ID, provide an introduction to the underlying mechanisms of deep learning relevant to animal re‐ID, highlight the success of deep learning methods for human re‐ID, describe the few ecological studies currently utilizing deep learning for camera trap analyses and our predictions for near future methodologies based on the rapid development of deep learning methods. For decades, ecologists with expertise in computer vision have successfully utilized feature engineering to extract meaningful features from camera trap images to improve the statistical rigor of individual comparisons and remove human bias from their camera trap analyses. Recent years have witnessed the emergence of deep learning systems which have demonstrated the accurate re‐ID of humans based on image and video data with near perfect accuracy. Despite this success, ecologists have yet to utilize these approaches for animal re‐ID. By utilizing novel deep learning methods for object detection and similarity comparisons, ecologists can extract animals from an image/video data and train deep learning classifiers to re‐ID animal individuals beyond the capabilities of a human observer. This methodology will allow ecologists with camera/video trap data to reidentify individuals that exit and re‐enter the camera frame. Our expectation is that this is just the beginning of a major trend that could stand to revolutionize the analysis of camera trap data and, ultimately, our approach to animal ecology.
Ecological camera traps are increasingly used by wildlife biologists to unobtrusively monitor an ecosystems animal population. However, manual inspection of the images produced is expensive, laborious, and time‐consuming. The success of deep learning systems using camera trap images has been previously explored in preliminary stages. These studies, however, are lacking in their practicality. They are primarily focused on extremely large datasets, often millions of images, and there is little to no focus on performance when tasked with species identification in new locations not seen during training. Our goal was to test the capabilities of deep learning systems trained on camera trap images using modestly sized training data, compare performance when considering unseen background locations, and quantify the gradient of lower bound performance to provide a guideline of data requirements in correspondence to performance expectations. We use a dataset provided by Parks Canada containing 47,279 images collected from 36 unique geographic locations across multiple environments. Images represent 55 animal species and human activity with high‐class imbalance. We trained, tested, and compared the capabilities of six deep learning computer vision networks using transfer learning and image augmentation: DenseNet201, Inception‐ResNet‐V3, InceptionV3, NASNetMobile, MobileNetV2, and Xception. We compare overall performance on “trained” locations where DenseNet201 performed best with 95.6% top‐1 accuracy showing promise for deep learning methods for smaller scale research efforts. Using trained locations, classifications with <500 images had low and highly variable recall of 0.750 ± 0.329, while classifications with over 1,000 images had a high and stable recall of 0.971 ± 0.0137. Models tasked with classifying species from untrained locations were less accurate, with DenseNet201 performing best with 68.7% top‐1 accuracy. Finally, we provide an open repository where ecologists can insert their image data to train and test custom species detection models for their desired ecological domain.
Deep learning methods for computer vision tasks show promise for automating the data analysis of camera trap images. Ecological camera traps are a common approach for monitoring an ecosystem's animal population, as they provide continual insight into an environment without being intrusive. However, the analysis of camera trap images is expensive, labour intensive, and time consuming. Recent advances in the field of deep learning for object detection show promise towards automating the analysis of camera trap images. Here, we demonstrate their capabilities by training and comparing two deep learning object detection classifiers, Faster R-CNN and YOLO v2.0, to identify, quantify, and localize animal species within camera trap images using the Reconyx Camera Trap and the self-labeled Gold Standard Snapshot Serengeti data sets. When trained on large labeled datasets, object recognition methods have shown success. We demonstrate their use, in the context of realistically sized ecological data sets, by testing if object detection methods are applicable for ecological research scenarios when utilizing transfer learning. Faster R-CNN outperformed YOLO v2.0 with average accuracies of 93.0% and 76.7% on the two data sets, respectively. Our findings show promising steps towards the automation of the labourious task of labeling camera trap images, which can be used to improve our understanding of the population dynamics of ecosystems across the planet.
High diversity is often poorly explained by trait-based deterministic models, in part because stochastic processes also influence community assembly. Testing how deterministic and stochastic processes combine to regulate diversity, however, has been limited by the spatial complexity of these interactions. Here, we demonstrate how spatial variability in small-mammal predation on plants, mostly by granivory, results in fine-scale switching between deterministically and stochastically regulated plant community assembly in an otherwise environmentally homogeneous tallgrass prairie. We initiated assembly with the uniform application of a 24-species mixture of prairie grasses and forbs, thereby setting the maximum level of diversity (γ-diversity). In field edges with higher densities of small mammals, traits reducing seed palatability deterministically produced homogeneous subsets of less palatable plant species within the first few months after planting (low α and β diversity). As small-mammal densities decreased in more open areas, assembly unfolded stochastically on the basis of which planted species happened to land at a given location (high α and β diversity). We used randomization models to validate that this higher β diversity was explained by true differences in community structure among plots rather than by the hidden effects of increasing α diversity. The net effect at the site level was a spatially structured array of prairie species, including a positive relationship between diversity and environmental suitability relating to reduced predator intensity.
Deep learning has become the standard methodology to approach computer vision tasks when a large amount of labeled data is available. One area where traditional deep learning approaches fail to perform is one-shot learning tasks where a model must make accurate classifications after seeing only one example image. Here, we measure the capabilities of five Siamese similarity comparison networks based on the AlexNet, VGG-19, DenseNet201, MobileNetV2, and InceptionV3 architectures considering the challenging one-shot learning task of animal re-identification. We consider five data sets corresponding to five different species: humans, chimpanzees, humpback whales, fruit flies, and octopus, each with their own unique set of challenges. Using a five-fold validation split, we demonstrate that each network structure was able to successfully re-identify animal individuals, with DenseNet201 performing optimally with 89.7, 75.5, 61.4, 79.3 and 92.2 percentage accuracy on the human, chimpanzee, humpback whale, fruit fly, and octopus data sets respectively-without any species-specific modifications. Our results demonstrate that similarity comparison networks can achieve accuracies beyond humanlevel performance for the task of animal re-identification. The ability of a researcher to re-identify an animal individual upon re-encounter is fundamental for addressing a broad range of questions in the study of population dynamics and community/behavioural ecology. Our expectation is that similarity comparison networks are the beginning of a major trend that could stand to revolutionize animal re-identification from camera trap data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.