Lecture Notes in Computer Science
DOI: 10.1007/978-3-540-76414-4_19
|View full text |Cite
|
Sign up to set email alerts
|

Finding Lips in Unconstrained Imagery for Improved Automatic Speech Recognition

Abstract: Lip movement of a speaker conveys important visual speech information and can be exploited for Automatic Speech Recognition. While previous research demonstrated that visual modality is a viable tool for identifying speech, the visual information has yet to become utilized in mainstream ASR systems. One obstacle is the difficulty in building a robust visual front end that tracks lips accurately in a real-world condition. In this paper we present our current progress in addressing the issue. We examine the use … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…However, if a subject moves in a direction normal to the camera plane (closer or further to the camera) where the current candidate ROI will either no longer contain the full face or will contain additional background information, an ROI scaling algorithm was developed. It was found in previous research [21] that the expected value for the skin class in the shifted-hue (sH) color plane is lower than the background class. Therefore, if we determine the gradient of the candidate ROI for the sH color plane, then the gradient magnitudes are expected to be larger around the face perimeter.…”
Section: Roi Scalingmentioning
confidence: 95%
See 1 more Smart Citation
“…However, if a subject moves in a direction normal to the camera plane (closer or further to the camera) where the current candidate ROI will either no longer contain the full face or will contain additional background information, an ROI scaling algorithm was developed. It was found in previous research [21] that the expected value for the skin class in the shifted-hue (sH) color plane is lower than the background class. Therefore, if we determine the gradient of the candidate ROI for the sH color plane, then the gradient magnitudes are expected to be larger around the face perimeter.…”
Section: Roi Scalingmentioning
confidence: 95%
“…To simplify processing within the hue component a 0.2 shift was then applied, resulting in the modified color space, sHSI. § © More details on color space analysis can be found in our previous work [21].…”
Section: Proposed System 21 Optimal Color Space For Face and Lip Detmentioning
confidence: 99%
“…To determine the optimal color space for efficient skin and face detection, various color spaces have been examined, such as RGB, nrgb, YcbCr, YIQ, and HSV in [11]. Manaully drawn lip masks were constructed from a database of over 400 images that were subsequently used to develop statistical models of Lip, Non-lip, and Skin classes.…”
Section: Skin Classification Via Shsv Color Spacementioning
confidence: 99%