2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
DOI: 10.1109/icassp.2006.1660092
|View full text |Cite
|
Sign up to set email alerts
|

An Analysis of Visual Speech Information Applied to Voice Activity Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 48 publications
(34 citation statements)
references
References 7 publications
0
34
0
Order By: Relevance
“…Sodoyer et al [6] extended this idea to combine audio-visual speech processing and blind source separation to form an early contribution to audio-visual source separation research. More recently Wang et al [7] and Sodoyer et al [8] have used visual information to help solve the convolutive case of BSS. However, audio-visual BSS is still in the early stages of research compared to audio only BSS; this paper gives an overview of the research area so far, and provides recently obtained results and suggestions for future research.…”
Section: Does the Listener Recognize What One Person Is Saying Among mentioning
confidence: 99%
See 2 more Smart Citations
“…Sodoyer et al [6] extended this idea to combine audio-visual speech processing and blind source separation to form an early contribution to audio-visual source separation research. More recently Wang et al [7] and Sodoyer et al [8] have used visual information to help solve the convolutive case of BSS. However, audio-visual BSS is still in the early stages of research compared to audio only BSS; this paper gives an overview of the research area so far, and provides recently obtained results and suggestions for future research.…”
Section: Does the Listener Recognize What One Person Is Saying Among mentioning
confidence: 99%
“…Sodoyer et al [6], [8] extract the internal width and height of the lips using a chroma-key process and contour tracking on lips with blue makeup. Wang et al [7] and Aubrey et al [12] use facial features found on the basis of an AAM.…”
Section: Feature Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…Even though solutions for these problems have been proposed (e.g, [11,19,32]), various researchers have argued that taking the visual signal into account (if available) can help in addressing these issues, e.g. because the presence or absence of lip movements can help in distinguishing noise from speech [35], and because visual cues can help for speech segmentation. Moreover, importantly, visual cues such as mouth and head movements typically precede the actual onset of speech [40], allowing for an earlier detection of speech events, which in turn may be beneficial for the robustness of speech recognition systems.…”
Section: Introductionmentioning
confidence: 99%
“…In a preliminary work, lip movements have been shown to be good candidates to characterize the opposition between silence and non-silence activity (Sodoyer et al, 2006), the lip-shape variations being generally smaller in silence sections. Therefore, following this previous work, we chose to describe the lip shape movements with one dynamic parameter, summing the absolute values of the two lip parameter derivatives (Sodoyer et al, 2006):…”
Section: A Dynamic Lip Parameter For Silence Vs Non-silence Charamentioning
confidence: 99%