2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6853588
|View full text |Cite
|
Sign up to set email alerts
|

Deep Scattering Spectrum with deep neural networks

Abstract: State-of-the-art convolutional neural networks (CNNs) typically use a log-mel spectral representation of the speech signal. However, this representation is limited by the spectro-temporal resolution afforded by log-mel filter-banks. A novel technique known as Deep Scattering Spectrum (DSS) addresses this limitation and preserves higher resolution information, while ensuring time warp stability, through the cascaded application of the wavelet-modulus operator. The first order scatter is equivalent to log-mel fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
22
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(22 citation statements)
references
References 25 publications
0
22
0
Order By: Relevance
“…The main challenge in incorporating second order information into speech recognition systems is in that first order coefficients typically require some further band-pass filtering to capture local invariants in such spectro-temporal decompositions of speech waveforms. Empirically, the most effective neural architecture for hybrid acoustic models with scattering features has been proposed in [19]. The architecture is a junction network that takes as input first and second order scattering coefficients via separate pipelines which are then merged into a multi-layer perceptron with several hidden layers.…”
Section: Network Architecturementioning
confidence: 99%
See 2 more Smart Citations
“…The main challenge in incorporating second order information into speech recognition systems is in that first order coefficients typically require some further band-pass filtering to capture local invariants in such spectro-temporal decompositions of speech waveforms. Empirically, the most effective neural architecture for hybrid acoustic models with scattering features has been proposed in [19]. The architecture is a junction network that takes as input first and second order scattering coefficients via separate pipelines which are then merged into a multi-layer perceptron with several hidden layers.…”
Section: Network Architecturementioning
confidence: 99%
“…The pooling is applied with compression rates 3, 2 and 1, respectively. We note here that our configuration of channels and filter sizes is different from prior work [10,19]. The pipeline for second order coefficients is a multi-layer perceptron (MLP) with 512 activation units, followed by batch normalization [22], RELU non-linearity, and a dropout block.…”
Section: Network Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…WST has enjoyed significant success in various audio [27] and biomedical [30] signal classification tasks. WST demonstrated promising results on the TIMIT dataset for phonetic classification [31] and recognition [32].…”
Section: Introductionmentioning
confidence: 99%
“…The restriction to wavelets filters allows the deep scattering networks to have explicit and physics-related properties (frequency band, timescales of interest, amplitudes) that greatly simplifies the architecture design in contrast with classical deep convolutional neural network. Scattering networks have shown to perform high-quality classification of audio signals [20][21][22] and electrocardiograms 23 . A deep scattering network decomposes the signal's structure through a tree of wavelet convolutions, modulus operations, and average pooling, providing a stable representation at multiple time and frequency scales 20 .…”
mentioning
confidence: 99%