2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6638089
|View full text |Cite
|
Sign up to set email alerts
|

Designing relevant features for visual speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 18 publications
0
9
0
Order By: Relevance
“…Note that, in practice, we use string kernel maps for Φ [7]. Details about the design of these kernel maps, out of the main scope of this paper, are deliberately omitted and can be found in [7].…”
Section: Kernel-based Unary Potentialmentioning
confidence: 99%
“…Note that, in practice, we use string kernel maps for Φ [7]. Details about the design of these kernel maps, out of the main scope of this paper, are deliberately omitted and can be found in [7].…”
Section: Kernel-based Unary Potentialmentioning
confidence: 99%
“…Evano and Besacier [8] investigated liveness verification based upon an analysis of the synchronicity of visual and audio features and reported an Equal Error Rate of 14.5% using the XM2VTS dataset. In [10] a liveness verification system based on only using visual information was proposed that is based on speech recognition with an SVM (support vector machine) to recognize digits that had been individually segmented. A speech recognition rate of 68% was reported on the XM2VTS dataset, using the approach in [10] with only the visual modality.…”
Section: Introductionmentioning
confidence: 99%
“…In [10] a liveness verification system based on only using visual information was proposed that is based on speech recognition with an SVM (support vector machine) to recognize digits that had been individually segmented. A speech recognition rate of 68% was reported on the XM2VTS dataset, using the approach in [10] with only the visual modality. In this paper, the aim is to show an improvement over previous works through the use of deep learning.…”
Section: Introductionmentioning
confidence: 99%
“…The studies carried out by Benhaim et al [67] using the CUAVE database reported a speech recognition accuracy of 85% in speaker independent experiments. For visual features the approach used histogram-based descriptors around twelve lip landmarks determined using an AAM fitting technique and the classification involved multiple kernel learning and SVM.…”
Section: Comparison With Other Studiesmentioning
confidence: 99%
“…Lip reading is the ability or skill to understand speech through information gleaned from the lower part of face, typically by following lip, tongue and jaw movement patterns. Speechreading may include lip reading information, but may provide additional understanding of the speech by interpreting whole face expressions, gestures and body language [4][5][6], as well as environmental conditions, such as specific characteristics of the speaker and the time and physical location at which the conversation took place [7].…”
Section: Introductionmentioning
confidence: 99%