Image-based methods for species identification offer cost-efficient solutions for biomonitoring. This is particularly relevant for invertebrate studies, where bulk samples often represent insurmountable workloads for sorting, identifying, and counting individual specimens. On the other hand, image-based classification using deep learning tools have strict requirements for the amount of training data, which is often a limiting factor. Here, we examine how classification accuracy increases with the amount of training data using the BIODISCOVER imaging system constructed for image-based classification and biomass estimation of invertebrate specimens. We use a balanced dataset of 60 specimens of each of 16 taxa of freshwater macroinvertebrates to systematically quantify how classification performance of a convolutional neural network (CNN) increases for individual taxa and the overall community as the number of specimens used for training is increased. We show a striking 99.2% classification accuracy when the CNN (EfficientNet-B6) is trained on 50 specimens of each taxon, and also how the lower classification accuracy of models trained on less data is particularly evident for morphologically similar species placed within the same taxonomic order. Even with as little as 15 specimens used for training, classification accuracy reached 97%. Our results add to a recent body of literature showing the huge potential of image-based methods and deep learning for specimen-based research, and furthermore offers a perspective to future automatized approaches for deriving ecological data from bulk arthropod samples.
We feel that some of the results and conclusions in this paper need some comments. 1) The authors claim on p. 59 that their method clearly stands out as the first direct and elegant solution to the problem of the finite dipping dike in gravity prospecting. This is, in fact, not true, since Sharma et al (1970) arrived at exactly the same formulas. In our opinion their method of interpretation seems more realistic in many ways than the authors. For instance they find the depth to the dike from the slope of a straight line fitted to a spectral quantity, where the authors make use of selected points of the spectrum. The latter method seems to be more sensitive to noise than the former, where the straight line is fitted to the spectral quantity over some frequency range, thus some kind of averaging process is performed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.