Visual analysis of complex fish habitats is an important step towards sustainable fisheries for human consumption and environmental protection. Deep Learning methods have shown great promise for scene analysis when trained on large-scale datasets. However, current datasets for fish analysis tend to focus on the classification task within constrained, plain environments which do not capture the complexity of underwater fish habitats. To address this limitation, we present DeepFish as a benchmark suite with a large-scale dataset to train and test methods for several computer vision tasks. The dataset consists of approximately 40 thousand images collected underwater from 20 habitats in the marine-environments of tropical Australia. The dataset originally contained only classification labels. Thus, we collected point-level and segmentation labels to have a more comprehensive fish analysis benchmark. These labels enable models to learn to automatically monitor fish count, identify their locations, and estimate their sizes. Our experiments provide an in-depth analysis of the dataset characteristics, and the performance evaluation of several state-of-the-art approaches based on our benchmark. Although models pre-trained on ImageNet have successfully performed on this benchmark, there is still room for improvement. Therefore, this benchmark serves as a testbed to motivate further development in this challenging domain of underwater computer vision. Monitoring fish in their natural habitat is an important step towards sustainable fisheries. In the New South Wales state of Australia, for example, fisheries is valued at more than 100 million Australian dollars in 2012-2013 14. Effective monitoring can provide information about which areas require protection and restoration to maintain healthy fish populations for both human consumption and environmental protection. Having a system that can automatically perform comprehensive monitoring can significantly reduce labour costs and increase efficiency. The system can lead to a large positive sustainability impact and improve our ability to maintain a healthy ecosystem. Deep learning methods have consistently achieved state-of-the-art results in image analysis. Many methods based on deep neural networks achieved top performance for a variety of applications, including, ecological monitoring with camera trap data. One reason behind this success is that these methods can leverage largescale, publicly available datasets such as ImageNet 6 and COCO 24 for training before being fine-tuned for a new application. A particularly challenging application involves automatic analysis of underwater fish habitats which demands a comprehensive, accurate computer vision system. Thus, considerable research efforts have been put towards developing systems for the task of understanding complex marine environments and distinguishing between a diverse set of fish species, which are based on publicly available fish datasets 1,3,8,15,35. However, these fish datasets are small and do not fully capture t...
Given a sufficiently large training dataset, it is relatively easy to train a modern convolution neural network (CNN) as a required image classifier. However, for the task of fish classification and/or fish detection, if a CNN was trained to detect or classify particular fish species in particular background habitats, the same CNN exhibits much lower accuracy when applied to new/unseen fish species and/or fish habitats. Therefore, in practice, the CNN needs to be continuously fine-tuned to improve its classification accuracy to handle new project-specific fish species or habitats. In this work we present a labellingefficient method of training a CNN-based fish-detector (the Xception CNN was used as the base) on relatively small numbers (4,000) of project-domain underwater fish/no-fish images from 20 different habitats. Additionally, 17,000 of known negative (that is, missing fish) general-domain (VOC2012) above-water images were used. Two publicly available fish-domain datasets supplied additional 27,000 of above-water and underwater positive/fish images. By using this multi-domain collection of images, the trained Xception-based binary (fish/not-fish) classifier achieved 0.17% false-positives and 0.61% false-negatives on the project's 20,000 negative and 16,000 positive holdout test images, respectively. The area under the ROC curve (AUC) was 99.94%.
Total of 1072 Asian seabass or barramundi (Lates calcarifer) were harvested at two different locations in Queensland, Australia. Each fish was digitally photographed and weighed. A subsample of 200 images (100 from each location) were manually segmented to extract the fish-body area (S in cm 2 ), excluding all fins. After scaling the segmented images to 1mm per pixel, the fish mass values (M in grams) were fitted by a single-factor model ( 1.5
Understanding and modelling how fish respond to climate change, habitat degradation and fishing pressure are critical for environmental protection and are crucial steps towards ensuring sustainable natural fisheries, to support ever-growing human consumption (Zarco-Perello & Enríquez, 2019). Effective monitoring is a vital first step underpinning decision support mechanisms for identifying problems and planning actions to preserve and restore the habitats. However, there is still a gap between the complexity of marine ecosystems and the available monitoring mechanisms.Marine scientists use underwater cameras to record, model and understand fish habitats and fish behaviour. Remote underwater video (RUV) recording in marine applications (Zarco-Perello &
Approximately 2,500 weights and corresponding images of harvested Lates calcarifer (Asian seabass or barramundi) were collected at three different locations in Queensland, Australia. Two instances of the LinkNet-34 segmentation Convolutional Neural Network (CNN) were trained. The first one was trained on 200 manually segmented fish masks with excluded fins and tails. The second was trained on 100 whole-fish masks. The two CNNs were applied to the rest of the images and yielded automatically segmented masks. The one-factor and two-factor simple mathematical weight-from-area models were fitted on 1072 area-weight pairs from the first two locations, where area values were extracted from the automatically segmented masks. When applied to 1,400 test images (from the third location), the onefactor whole-fish mask model achieved the best mean absolute percentage error (MAPE), MAPE = 4.36%. Direct weight-fromimage regression CNNs were also trained, where the no-fins based CNN performed best on the test images with MAPE = 4.28%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.