1996
DOI: 10.1007/978-3-662-13015-5_37
|View full text |Cite
|
Sign up to set email alerts
|

Audiovisual Sensory Integration Using Hidden Markov Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

1996
1996
1999
1999

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…There have been various implementations of the DI architecture in speech recognition (see, e.g. [2], [12], [18], [21], [30], [42], [44], [49], [54], and [59]) and psychophysical modeling (see [11] and 16]). …”
Section: A Theoretical Backgroundmentioning
confidence: 99%
“…There have been various implementations of the DI architecture in speech recognition (see, e.g. [2], [12], [18], [21], [30], [42], [44], [49], [54], and [59]) and psychophysical modeling (see [11] and 16]). …”
Section: A Theoretical Backgroundmentioning
confidence: 99%
“…Using combined audio and visual features, recognition performance was improved by a maximum of 10% at high and low SNR's over an audio-only recogniser. Future work will focus on finding more effective ways of combining the audio and visual information with the aim of ensuring that the combined performance is always at least as good as the performance using either modality [1,14,16,17] and in deriving more discriminative features from the scale histogram.…”
Section: Discussionmentioning
confidence: 99%
“…It has already been shown [1,6,8,10,13,15,16,17] that the incorporation of visual information with acoustic speech recognition leads to a more robust recogniser. While the visual cues of speech alone are unable to discriminate between all phonemes (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…The models in this study were trained only with 'clean' data and used a single system. Other approaches using HMMs have attempted to compute an explicit weighting factor to bias the recognition in favour of the visual compoinent when acoustic noise levels increase and thus to build an adaptive system [1,17]. Although all of the systems have demonstrated the benefits of adding a visual component to the recognition process, especially when the acoustic noise levels are high, none has yet clearly demonstrated a bimodal performance that is uniformly better than the unimodal performance in either of the two domains.…”
Section: Architectures Using Hmmsmentioning
confidence: 99%