Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-279
|View full text |Cite
|
Sign up to set email alerts
|

Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech

Abstract: Recognizing speech under high levels of channel and/or noise degradation is challenging. Current state-of-the-art automatic speech recognition systems are sensitive to changing acoustic conditions, which can cause significant performance degradation. Noise-robust acoustic features can improve speech recognition performance under varying background conditions, where it is usually observed that robust modeling techniques and multiple system fusion can help to improve the performance even further. This work inves… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…We observed that time-frequency convolution (using TFCNN [11,17,18,19,20,21]) performed better than 1-D frequency convolution, and hence we have focused on the TFCNN acoustic models for our experiments presented in this paper. The TFCNN architecture is same as in [11,22], where two parallel convolutional layers are used at the input, one performing convolution across time, and the other across frequency on the input filterbank features. The TFCNNs had 75 filters to perform time convolution and 200 filters to perform frequency convolution.…”
Section: Acoustic Modelmentioning
confidence: 99%
“…We observed that time-frequency convolution (using TFCNN [11,17,18,19,20,21]) performed better than 1-D frequency convolution, and hence we have focused on the TFCNN acoustic models for our experiments presented in this paper. The TFCNN architecture is same as in [11,22], where two parallel convolutional layers are used at the input, one performing convolution across time, and the other across frequency on the input filterbank features. The TFCNNs had 75 filters to perform time convolution and 200 filters to perform frequency convolution.…”
Section: Acoustic Modelmentioning
confidence: 99%
“…However, the performance is not very sensitive to w s and w max . Therefore, the multisource fusion algorithm is proposed to help the system choose better sensitivity value [17]. How the vehicle information is processed to execute the algorithm is explained as follow.…”
Section: Implementation Of Kws System With Multi-source Fusionmentioning
confidence: 99%