2017
DOI: 10.1016/j.csl.2017.02.006
|View full text |Cite
|
Sign up to set email alerts
|

On the relevance of auditory-based Gabor features for deep learning in robust speech recognition

Abstract: Previous studies support the idea of merging auditory-based Gabor features with deep learning architectures to achieve robust automatic speech recognition, however, the cause behind the gain of such combination is still unknown. We believe these representations provide the deep learning decoder with more discriminable cues. Our aim with this paper is to validate this hypothesis by performing experiments with three different recognition tasks (Aurora 4, CHiME 2 and CHiME 3) and assess the discriminability of th… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 19 publications
(5 citation statements)
references
References 40 publications
0
5
0
Order By: Relevance
“…This can effectively extract three-dimensional features from speakers such as mouth opening area and depth, for a better understanding of mouth movement during speaking. Recent studies [6,11,45] have demonstrated the effectiveness of the Gabor feature extraction method in visual speech recognition tasks. The authors previously applied Gabor feature extraction for English and Mandarin Chinese speech recognition [11], which achieved comparable performance to deep learning CNN-based approaches while maintaining system simplicity and explainability.…”
Section: Gabor-based Visual Feature Extractionmentioning
confidence: 99%
“…This can effectively extract three-dimensional features from speakers such as mouth opening area and depth, for a better understanding of mouth movement during speaking. Recent studies [6,11,45] have demonstrated the effectiveness of the Gabor feature extraction method in visual speech recognition tasks. The authors previously applied Gabor feature extraction for English and Mandarin Chinese speech recognition [11], which achieved comparable performance to deep learning CNN-based approaches while maintaining system simplicity and explainability.…”
Section: Gabor-based Visual Feature Extractionmentioning
confidence: 99%
“…This work claims that the GF-based feature extraction method gave better recognition than those of MFCC, perceptual linear predictive (PLP) and LPC. In the past, features generated from GF have been used as inputs to DNN to generate Gabor-DNN features and to CNN for improved speech recognition [12,35,36]. One such research incorporated GF into convolution filter kernels where a variety of Gabor features served as the feature maps of the convolution layer [12].…”
Section: Related Workmentioning
confidence: 99%
“…In the following paragraphs, we present details of the operation of the proposed architecture. Gabor-DNN features and to CNN for improved speech recognition [12,35,36]. One such research incorporated GF into convolution filter kernels where a variety of Gabor features served as the feature maps of the convolution layer [12].…”
Section: High Speed Pipelined Architecturementioning
confidence: 99%
“…The input layer with two neurons (one for each speech envelope stream value) is followed by layers with increasing number of neurons in each layer of 4, 8, 12, 16 and 20 neurons, all of which use a standard tanh activation function. Inspired by the layout of speech related neural networks [Martinez, Mallidi and Meyer 2017] this network size was chosen as a compromise between numbers of layers of the network and a reasonable number of adjustable parameters given the amount of training data specified above. Tests with slightly varying network depth and width did not show relevant differences in the outcomes as long as the parameter count was kept in the same range and therefore, we settled for the layout with a pyramidical scheme to make the dimension increase preferably smooth.…”
Section: Network Layoutmentioning
confidence: 99%