1995 International Conference on Acoustics, Speech, and Signal Processing
DOI: 10.1109/icassp.1995.479827
|View full text |Cite
|
Sign up to set email alerts
|

Knowing who to listen to in speech recognition: visually guided beamforming

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(11 citation statements)
references
References 6 publications
0
11
0
Order By: Relevance
“…Progress in addressing some or all of these questions can also benefit other areas where joint audio and visual speech processing is suitable [139], such as speaker identification and verification [49], [66], [109], [136], [140][141][142], visual text-to-speech [143][144][145] speech event detection [146] video indexing and retrieval [147], speech enhancement [102], [104], coding [148], signal separation [149], [150], and speaker localization [151][152][153]. Improvements in these areas will result in more robust and natural humancomputer interaction.…”
Section: Summary and Discussionmentioning
confidence: 99%
“…Progress in addressing some or all of these questions can also benefit other areas where joint audio and visual speech processing is suitable [139], such as speaker identification and verification [49], [66], [109], [136], [140][141][142], visual text-to-speech [143][144][145] speech event detection [146] video indexing and retrieval [147], speech enhancement [102], [104], coding [148], signal separation [149], [150], and speaker localization [151][152][153]. Improvements in these areas will result in more robust and natural humancomputer interaction.…”
Section: Summary and Discussionmentioning
confidence: 99%
“…A reliable face and mouth tracker could provide the tracking necessary for such a "lipreading" system. It has been shown that a more accurate localization in space can be delivered visually than acoustically [2].…”
Section: Introductionmentioning
confidence: 99%
“…16 The clear geometrical representation of the problem makes it a favorite feature to be used when approaching such a task by a machine setup. 9,11,12,[14][15][16][17][18] Another cue known to have notable importance in human dimensional hearing is the interaural level differences ͑ILDs͒. Surprisingly ILDs have seldom been used in actual system implementations because they are believed to have unfavorable frequency dependence and unreliability.…”
Section: A Sound Localization By Machinementioning
confidence: 98%
“…[9][10][11][12] In other studies a human model has been followed to some degree, resulting in constraints in applicability and limited accuracy. 13 A significant amount of work has been devoted to devices with a limited functionality ͑e.g., constrained to localization in a single half-plane while still using large sensor structures͒ [12][13][14] or the help of a nonacoustical modality has been used ͑e.g., vision͒. 14 In contrast to large, fixed sensor arrays for special situations and environments, this work concentrates on a compact, mobile sensor array that is suited for a mobile robot to localize 3D sound sources with moderate accuracy.…”
Section: A Sound Localization By Machinementioning
confidence: 99%
See 1 more Smart Citation