2018
DOI: 10.1007/978-3-319-99579-3_71
|View full text |Cite
|
Sign up to set email alerts
|

A Perceptually Inspired Data Augmentation Method for Noise Robust CNN Acoustic Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(13 citation statements)
references
References 26 publications
0
13
0
Order By: Relevance
“…On the other hand, both the size and position of the frequency masks in SpecAugment are chosen stochastically, and differ for every input in the minibatch. More ideas for structurally omitting frequency data of spectrograms have been discussed in [50].…”
Section: Discussionmentioning
confidence: 99%
“…On the other hand, both the size and position of the frequency masks in SpecAugment are chosen stochastically, and differ for every input in the minibatch. More ideas for structurally omitting frequency data of spectrograms have been discussed in [50].…”
Section: Discussionmentioning
confidence: 99%
“…DNN with 26.8% examples-based enhancement [28] CNN with channel dropout [20] 26.8% CNN with data augmentation [29] 25.6% GMM with 25.5% auditory spectral enhancement [30] TDNN with Gabor filters, 25.0% multi-band processing and channel dropout [19] Current paper 24.3%…”
Section: Methods Wersmentioning
confidence: 99%
“…However, as a side effect, deep learning-based models require a large amount of labeled training data to combat overfitting and ensure high accuracy, especially for speech recognition tasks with few training data. Therefore, a lot of data augmentation methods for ASR [7,8,9,10,11,12] were proposed, mainly on augmenting speech data. For example, speed perturbation [7], pitch adjust [8], adding noise [9] and vocal tract length perturbation increases the quantity of speech data by adjusting the speed or pitch of the audio, or by adding noisy audio on the original clean audio, or by transforming spectrograms.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, a lot of data augmentation methods for ASR [7,8,9,10,11,12] were proposed, mainly on augmenting speech data. For example, speed perturbation [7], pitch adjust [8], adding noise [9] and vocal tract length perturbation increases the quantity of speech data by adjusting the speed or pitch of the audio, or by adding noisy audio on the original clean audio, or by transforming spectrograms. Recently, SpecAugment [10] was proposed to mask the mel-spectrogram along the time and frequency axes, and achieve good improvements on recognition accuracy.…”
Section: Introductionmentioning
confidence: 99%