2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
DOI: 10.1109/icassp.2000.859318
|View full text |Cite
|
Sign up to set email alerts
|

Audio-visual intent-to-speak detection for human-computer interaction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 23 publications
(11 citation statements)
references
References 8 publications
0
11
0
Order By: Relevance
“…Progress in addressing some or all of these questions can also benefit other areas where joint audio and visual speech processing is suitable [139], such as speaker identification and verification [49], [66], [109], [136], [140][141][142], visual text-to-speech [143][144][145] speech event detection [146] video indexing and retrieval [147], speech enhancement [102], [104], coding [148], signal separation [149], [150], and speaker localization [151][152][153]. Improvements in these areas will result in more robust and natural humancomputer interaction.…”
Section: Summary and Discussionmentioning
confidence: 99%
“…Progress in addressing some or all of these questions can also benefit other areas where joint audio and visual speech processing is suitable [139], such as speaker identification and verification [49], [66], [109], [136], [140][141][142], visual text-to-speech [143][144][145] speech event detection [146] video indexing and retrieval [147], speech enhancement [102], [104], coding [148], signal separation [149], [150], and speaker localization [151][152][153]. Improvements in these areas will result in more robust and natural humancomputer interaction.…”
Section: Summary and Discussionmentioning
confidence: 99%
“…In this case, visual information could be very useful since it is completely independent of the acoustic environment 2 . For instance, in a previous study, De Cueto et al (2000) used a basic Visual Voice Activity Detector (V-VAD) for detecting a speaker's speech activity in front of a computer. For this, either specific lip parameters or the average luminance of the mouth picture can be used (Iyengar and Neti, 2001).…”
Section: Application To Automatic Voice Activity Detectionmentioning
confidence: 99%
“…Nevertheless, even with the existence of this complexity-effectiveness trade-off, numerous systems have attempted to fuse multi-modal information for a variety of applications. Examples include audio-visual speech recognition systems employing a single camera and a microphone, resulting in a higher speech recognition accuracy rate, and a greater robustness to noise [5][6][7][8][9][10][11][12][13][14][15]. Other applications include audio-visual sound localization [2,3], where a speaker is localized visually using multiple cameras and acoustically using multiple microphones.…”
Section: Literature Reviewmentioning
confidence: 99%