2015
DOI: 10.1109/taslp.2015.2409785
|View full text |Cite
|
Sign up to set email alerts
|

Learning Dynamic Stream Weights For Coupled-HMM-based Audio-visual Speech Recognition

Abstract: With the increasing use of multimedia data in communication technologies, the idea of employing visual information in automatic speech recognition (ASR) has recently gathered momentum. In conjunction with the acoustical information, the visual data enhances the recognition performance and improves the robustness of ASR systems in noisy and reverberant environments. In audio-visual systems, dynamic weighting of audio and video streams according to their instantaneous confidence is essential for reliably and sys… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
48
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
7
3

Relationship

3
7

Authors

Journals

citations
Cited by 58 publications
(49 citation statements)
references
References 41 publications
1
48
0
Order By: Relevance
“…Optimization of stream weights on a frame-by-frame basis has proven its merit for coupled-HMM systems in [34]. It will be interesting to extend this technique to the presented turbodecoding system, adapting the stream weight according to estimated SNR, observation uncertainty, and model-based reliability measures like dispersion and entropy, in order to also consider the timevarying utility of video information in the process.…”
Section: Discussionmentioning
confidence: 99%
“…Optimization of stream weights on a frame-by-frame basis has proven its merit for coupled-HMM systems in [34]. It will be interesting to extend this technique to the presented turbodecoding system, adapting the stream weight according to estimated SNR, observation uncertainty, and model-based reliability measures like dispersion and entropy, in order to also consider the timevarying utility of video information in the process.…”
Section: Discussionmentioning
confidence: 99%
“…AVASR system consists of two important units: front-end unit and back-end unit (Abdelaziz et al, 2015). The main purpose of the front-end unit is preprocessing and feature extraction.…”
Section: System Description For Hindi Avasrmentioning
confidence: 99%
“…The work was inspired by previous applications of DSWs, which were initially proposed in the context of audiovisual ASR. Pioneering work in this regard was conducted in [20] for probabilistic inference incorporating DSWs into hidden Markov models (HMMs). Compared to conventional Bayesian fusion techniques, this allows to rapidly adapt the estimation process.…”
Section: Introductionmentioning
confidence: 99%