2022
DOI: 10.3390/s22155501
|View full text |Cite
|
Sign up to set email alerts
|

Reliability-Based Large-Vocabulary Audio-Visual Speech Recognition

Abstract: Audio-visual speech recognition (AVSR) can significantly improve performance over audio-only recognition for small or medium vocabularies. However, current AVSR, whether hybrid or end-to-end (E2E), still does not appear to make optimal use of this secondary information stream as the performance is still clearly diminished in noisy conditions for large-vocabulary systems. We, therefore, propose a new fusion architecture—the decision fusion net (DFN). A broad range of time-variant reliability measures are used a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 52 publications
0
1
0
Order By: Relevance
“…Secondly, the signal model can assist scientists in better understanding the audio source that they supplied to the computer. Last but not the least, the reason that the signal model is so important is that it can work perfectly well in practice and help people realize practical systems effectively [4,5,6].…”
Section: Speech Recognitionmentioning
confidence: 99%
“…Secondly, the signal model can assist scientists in better understanding the audio source that they supplied to the computer. Last but not the least, the reason that the signal model is so important is that it can work perfectly well in practice and help people realize practical systems effectively [4,5,6].…”
Section: Speech Recognitionmentioning
confidence: 99%