2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
DOI: 10.1109/icassp.2004.1326155
|View full text |Cite
|
Sign up to set email alerts
|

DBN based multi-stream models for audio-visual speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
35
0

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 53 publications
(35 citation statements)
references
References 6 publications
0
35
0
Order By: Relevance
“…We establish that the correct way to treat asynchrony in audio-visual speech recognition is within word boundaries and propose a new DBN phoneme model able to exploit its asynchrony without being limited to small vocabulary tasks. Our model thus enjoys the benefits of the DBNs documented in [3], [4], [5] overcoming the problems encountered by Graviet et al when extending their use to phonemes and large vocabulary tasks [6].…”
Section: Introductionmentioning
confidence: 89%
See 3 more Smart Citations
“…We establish that the correct way to treat asynchrony in audio-visual speech recognition is within word boundaries and propose a new DBN phoneme model able to exploit its asynchrony without being limited to small vocabulary tasks. Our model thus enjoys the benefits of the DBNs documented in [3], [4], [5] overcoming the problems encountered by Graviet et al when extending their use to phonemes and large vocabulary tasks [6].…”
Section: Introductionmentioning
confidence: 89%
“…Previous works [3], [4], [5] and our own experiments, see section IV, show the benefits of allowing audio-visual asynchrony within word boundaries. Analyzing why those asynchronous DBN models work better than the traditional MSHMMs, we develop a processing technique to overcome asynchrony by an additional processing step on visual features when MSHMMs are used for recognition.…”
Section: A Preliminary Idea Based On Model Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…Early multistream work also includes that of HMM decomposition [7], where both speech and noise are consider a separate stream. Dynamic Bayesian networks (DBNs) have also been used for multi-stream [8], including audio-visual speech recognition [9,10,11]. Even HTK has the ability to represent multiple synchronous acoustic streams.…”
Section: Introductionmentioning
confidence: 99%