Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers
DOI: 10.1109/acssc.1994.471515
|View full text |Cite
|
Sign up to set email alerts
|

Sensory integration in audiovisual automatic speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 8 publications
0
9
0
Order By: Relevance
“…The main attraction of this approach in the context of continuous speech recognition is that it is computationally tractable, since only a single pattern matching stage is used, and that existing procedures for training and testing HMMs can be applied without significant modification. It has been claimed that an advantage of this approach is that it ensures that the final hypothesis is based on the timealigned audio-visual data because features are combined for each analysis time frame [4]. In this section we argue that there are a number of reasons to allow asynchronies between the audio and visual data streams and make a case for the use of decision fusion algorithms at the utterance level.…”
Section: Feature or Decision Fusion?mentioning
confidence: 98%
See 2 more Smart Citations
“…The main attraction of this approach in the context of continuous speech recognition is that it is computationally tractable, since only a single pattern matching stage is used, and that existing procedures for training and testing HMMs can be applied without significant modification. It has been claimed that an advantage of this approach is that it ensures that the final hypothesis is based on the timealigned audio-visual data because features are combined for each analysis time frame [4]. In this section we argue that there are a number of reasons to allow asynchronies between the audio and visual data streams and make a case for the use of decision fusion algorithms at the utterance level.…”
Section: Feature or Decision Fusion?mentioning
confidence: 98%
“…Therefore, algorithms that reduce the dimensionality of the feature representations, such as linear discriminant analysis (LDA), are often applied to the data [3]. Silsbee [4] argues that this approach is advantageous because the most likely HMM state sequence is determined by the joint audio-visual data.…”
Section: Av Integration Schemesmentioning
confidence: 99%
See 1 more Smart Citation
“…Regardless of the signal-to-noise ratio, most systems perform better using both acoustical and optical sources of information than when using only one source of information (Bregler, Omohundro, et al, 1994;Bregler, Hild, et al, 1993;Mak & Allen, 1994;Petajan, 1984;Petajan, Bischoff, et al, 1988;Silsbee, 1994;Silsbee, 1993;Smith, 1989;Stork, Wolff, et al, 1992;Yuhas, Goldstein, et al, 1989). At a signal-to-noise ratio of zero with a 500-word task Silsbee (1993) achieves word accuracy recognition rates of 38%, 22%, and 58% respectively, using acoustical information, optical information, and both sources of information.…”
Section: Systemsmentioning
confidence: 99%
“…The first approach uses a comparator to merge the two independently recognized acoustical and optical events. This comparator may consist of a set of rules (e.g., if the top two phones from the acoustic recognizer is /t/ or /p/, then choose the one that has a higher ranking from the optical recognizer) (Petajan, Bischoff, et al, 1988) or a fuzzy logic integrator (e.g., provides linear weights associated with the acoustically and optically recognized phones) (Silsbee, 1993;Silsbee, 1994). The second approach performs recognition using a vector that includes both acoustical and optical information, such systems typically use neural networks to combine the optical information with the acoustic to improve the signal-to-noise ratio before phonemic recognition (Yuhas, Goldstein, et al, 1989;Bregler, Omohundro, et al, 1994;Bregler, Hild, et al, 1993;Stork, Wolff, et al, 1992;Silsbee, 1994).…”
Section: Systemsmentioning
confidence: 99%