2021
DOI: 10.1121/10.0004258
|View full text |Cite
|
Sign up to set email alerts
|

Data augmentation for the classification of North Atlantic right whales upcalls

Abstract: Passive acoustic monitoring (PAM) is a useful technique for monitoring marine mammals. However, the quantity of data collected through PAM systems makes automated algorithms for detecting and classifying sounds essential. Deep learning algorithms have shown great promise in recent years, but their performance is limited by the lack of sufficient amounts of annotated data for training the algorithms. This work investigates the benefit of augmenting training datasets with synthetically generated samples when tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 17 publications
(15 citation statements)
references
References 16 publications
1
14
0
Order By: Relevance
“…Interestingly, none of the fish acoustic studies to date (according to our literature search) mentioned using image augmentation to train their classifier; however, image augmentation has been used to increase the training data size and classification performance when labeling other animal calls [75,77,78]. For example, Padovese et al [79] used image augmentation to generate synthetic calls to increase training data size resulting in increased classifier recall and precision for labeling North Atlantic right whale (Eubalaena glacialis) upcalls. Rasmussen and Širović [80] used scaling and translation augmentation to prevent their classifier from overfitting during the training process.…”
Section: Resnet-50 Classifiermentioning
confidence: 99%
“…Interestingly, none of the fish acoustic studies to date (according to our literature search) mentioned using image augmentation to train their classifier; however, image augmentation has been used to increase the training data size and classification performance when labeling other animal calls [75,77,78]. For example, Padovese et al [79] used image augmentation to generate synthetic calls to increase training data size resulting in increased classifier recall and precision for labeling North Atlantic right whale (Eubalaena glacialis) upcalls. Rasmussen and Širović [80] used scaling and translation augmentation to prevent their classifier from overfitting during the training process.…”
Section: Resnet-50 Classifiermentioning
confidence: 99%
“…In recent years, the focus has shifted towards spectrogram data augmentation, where data augmentation is applied to the spectrogram features of audio. Techniques in this category include time warping, frequency masking, and time masking [25,26], which have yielded promising results. Building upon this foundation, enhanced spectrogram augmentation methods like Filteraugment [27] and SpecSub [28] have been introduced.…”
Section: Introductionmentioning
confidence: 99%
“…Unsurprisingly, some traditional music data augmentation methods, such as pitch shifting, time stretching, and adding background noise, have proven effective at this classification task. When synthesizing dolphin calls, care should be taken to apply augmentations to the audio signal rather than to the spectrograms since altering the spectrogram could distort the timefrequency patterns of dolphin whistles, which would result in the semantic integrity of the labels being compromised [29,34]. In [29], primitive shapes were interjected into the audio signal to generate realistic ambient sounds in negative samples, and classical computer vision methods were used to create synthetic time-frequency whistles, which replaced the training data.…”
Section: Introductionmentioning
confidence: 99%