Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2656
|View full text |Cite
|
Sign up to set email alerts
|

Deep Scattering Power Spectrum Features for Robust Speech Recognition

Abstract: Deep scattering spectrum consists of a cascade of wavelet transforms and modulus non-linearity. It generates features of different orders, with the first order coefficients approximately equal to the Mel-frequency cepstrum, and higher order coefficients recovering information lost at lower levels. We investigate the effect of including the information recovered by higher order coefficients on the robustness of speech recognition. To that end, we also propose a modification to the original scattering transform … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…Direct employment of the raw signal representations with minimal or no information loss, along with powerful learning architectures can effectively tackle the aforementioned issue. This has led to successful single-stream acoustic modelling using raw waveforms [4]- [12], raw magnitude [13], deep scattering spectrum [14] and raw phase spectrum [15] with better or comparable performance to the classic features across various tasks, even for databases as small as TIMIT [16]- [18].…”
Section: Introductionmentioning
confidence: 99%
“…Direct employment of the raw signal representations with minimal or no information loss, along with powerful learning architectures can effectively tackle the aforementioned issue. This has led to successful single-stream acoustic modelling using raw waveforms [4]- [12], raw magnitude [13], deep scattering spectrum [14] and raw phase spectrum [15] with better or comparable performance to the classic features across various tasks, even for databases as small as TIMIT [16]- [18].…”
Section: Introductionmentioning
confidence: 99%
“…These preprocessing methods provided an (Anden and Mallat, 2014). The wavelet scattering method has been wildly used for acoustic scene classification (Li et al, 2019), speech recognition (Fousek et al, 2015;Joy et al, 2020), and heart sound classification (Mei et al, 2021), which yielded efficient representations for audio processing. However, wavelet scattering currently was seldom used in ECG analysis and application.…”
Section: Introductionmentioning
confidence: 99%
“…First-order scattering coefficients characterize persistent phenomena such as tendency and envelope, while second-order scattering coefficients characterize transient phenomena such as shock signals and amplitude modulation ( Anden and Mallat, 2014 ). The wavelet scattering method has been wildly used for acoustic scene classification ( Li et al, 2019 ), speech recognition ( Fousek et al, 2015 ; Joy et al, 2020 ), and heart sound classification ( Mei et al, 2021 ), which yielded efficient representations for audio processing. However, wavelet scattering currently was seldom used in ECG analysis and application.…”
Section: Introductionmentioning
confidence: 99%
“…The latter is challenging to address for waveform-based models due to the fact that the feature extraction process is fully automated and utterance level mean-normalization cannot be performed as in the case of standard non-adaptive filterbank features. Prior work has established that such normalizations can be fundamental in dealing with spurious correlations introduced by different microphones and stationary signal corruptions [2,16].…”
Section: Introductionmentioning
confidence: 99%