2018
DOI: 10.1101/437004
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Convolutional Network for Animal Sound Classification and Source Attribution using Dual Audio Recordings

Abstract: We introduce an end-to-end feedforward convolutional neural network that is able to reliably classify the source and type of animal calls in a noisy environment using two streams of audio data after being trained on a dataset of modest size and imperfect labels. The data consists of audio recordings from captive marmoset monkeys housed in pairs, with several other cages nearby. Our network can classify both the call type and which animal made it with a single pass through a single network using raw spectrogram… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 22 publications
0
13
0
Order By: Relevance
“…IOR was 0.86 for whether a call occurred and 0.91 for call type. This data was then used to train a neural network for auto-detection [26]. The remaining two pairs were annotated with the neural network and corrected by one observer.…”
Section: Discussionmentioning
confidence: 99%
“…IOR was 0.86 for whether a call occurred and 0.91 for call type. This data was then used to train a neural network for auto-detection [26]. The remaining two pairs were annotated with the neural network and corrected by one observer.…”
Section: Discussionmentioning
confidence: 99%
“…As shown, the default setting applies some amount of over-and undersampling, while not completely balancing distributions. The reasons were (a) decreasing training time for the initial experiments by reducing the data set size, (b) ensuring that the extreme imbalance in the training set does not prevent training convergence [19], and (c) to imitate the default setting commonly used in animal call detection systems, which usually start with already undersampled databases [10,12,14].…”
Section: Resamplingmentioning
confidence: 99%
“…In the annual DCASE-challenges, deep-learning based systems became popular between 2016 and 2017 [6][7][8][9]. Research teams working on automatic animal call detection particularly adapted convolutional neural networks (CNNs) with spectrogram inputs: Bergler et al [10] applied Res-Net [11] variants for detection of orca calls in long-term recordings; Bjorck et al [12] applied Dense-Net CNNs [13] for detecting African forest elephants with PAM; Oikarinen et al [14] applied siamese CNNs with stereo inputs to the detection of various marmoset monkey calls. Aodha et al [15] investigated CNNs for bat detection.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In the field of audio classification and speech recognition, to the best of our knowledge, this is the first library specifically designed for audio data augmentation. Audio data augmentation techniques fall into two different categories, depending on whether they are directly applied to the audio signal [5] or to a spectrogram generated from the audio signal [6]. We propose 15 algorithms to augment raw audio data and 8 methods to augment spectrogram data.…”
Section: Introductionmentioning
confidence: 99%