2020
DOI: 10.1186/s13636-020-00175-3
|View full text |Cite
|
Sign up to set email alerts
|

Ensemble of convolutional neural networks to improve animal audio classification

Abstract: Recently, deep learning classifiers have proven even more robust in pattern recognition and classification than have texture analysis techniques. With the broad availability of relatively inexpensive Graphics Processing Units (GPUs), many researchers have begun applying deep learning techniques to visual representations of acoustic traces. Preselected or handcrafted descriptors, such as LBP, are not necessary for deep learners since they learn salient features during the training phase. Deep learners, moreover… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(26 citation statements)
references
References 81 publications
0
22
0
Order By: Relevance
“…Nanni et al [22] proposed a set of classifiers that work similarly using taxonomy and parameter settings on different animal audio datasets. To create this general-purpose ensemble, they experimented with a huge amount of finely tuned Convolutional Neural Networks (CNNs) already trained for a variety of audio classification tasks.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Nanni et al [22] proposed a set of classifiers that work similarly using taxonomy and parameter settings on different animal audio datasets. To create this general-purpose ensemble, they experimented with a huge amount of finely tuned Convolutional Neural Networks (CNNs) already trained for a variety of audio classification tasks.…”
Section: Related Workmentioning
confidence: 99%
“…The presentation measures have been clarified as pursues. This setup is connected for RBFNN [22], PLMC [23], RARE [25], and the proposed MR-WOA-SVM calculations.…”
Section: Performance and Comparative Analysismentioning
confidence: 99%
“…The common property they share is the architecture, which is based on convolutional neural networks (CNN) for features extraction and fully connected layers for classification [ 22 ]. The authors of [ 13 , 14 , 16 , 20 , 23 ] applied 1D convolutions with raw audio, whereas [ 13 , 15 , 17 , 18 , 24 , 25 ] used 2D convolutions with special audio features such as mel frequency cepstral coefficients (MFCCs) [ 26 ]. According to [ 13 ], both approaches are effective, but MFCCs are a great approximation to the human voice that help to discard background noise [ 26 ].…”
Section: Introductionmentioning
confidence: 99%
“…Since audio is a one-dimensional signal, their CNN architecture is also based on one-dimensional filters. Second, [ 13 , 15 , 17 , 18 , 24 , 25 , 31 ] applied various audio features such as mel frequency cepstral coefficients (MFCCs) [ 26 ] or Spectrograms with 2D convolutions. Since these features have a two-dimensional representation, CNN filters have the corresponding shape.…”
Section: Introductionmentioning
confidence: 99%
“…This table is a representative reference for the animal speech classification accuracy of the existing deep learning technologies. Additionally, exhaustive tests have been performed on the fusion between an ensemble of handcrafted descriptors and an ensemble system based on a CNN [48], and the possibility was higher for the classification performance of the CNN-based ensemble system. Recent attempts have been made to analyze livestock voices in relation to animal welfare in livestock facilities.…”
Section: Discussionmentioning
confidence: 99%