Audiovisual Speech Processing 2012
DOI: 10.1017/cbo9780511843891.011
|View full text |Cite
|
Sign up to set email alerts
|

Audiovisual automatic speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
119
0
4

Year Published

2015
2015
2017
2017

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 81 publications
(123 citation statements)
references
References 0 publications
0
119
0
4
Order By: Relevance
“…First of all, a region of interest (ROI) around the mouth, which contains the largest amount of information about the utterance, has to be extracted [23]. This can be done by hand or with the help of a face tracker.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…First of all, a region of interest (ROI) around the mouth, which contains the largest amount of information about the utterance, has to be extracted [23]. This can be done by hand or with the help of a face tracker.…”
Section: Related Workmentioning
confidence: 99%
“…In general three types of features are used: texture-based features, shapebased features, or a combination of both [23,5]. Texture-based features exploit the pixel values in a ROI -usually closely around the mouth or including the jaws [23].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In [8], the results of visual ASR experiments involving the use of the IBM ViaVoice database were presented in their comparison of four types of visual features, namely discrete cosine transform (DCT) [9], discrete wavelet transform (DWT) [10], principal components analysis (PCA) [11], and active appearance models (AAM) [12]. A solution using hidden Markov models (HMMs) [13] as the classifier found that DCT based visual features were the most promising for the recognition task.…”
Section: Introductionmentioning
confidence: 99%