2012
DOI: 10.1007/s00371-012-0751-7
|View full text |Cite
|
Sign up to set email alerts
|

Automatic visual speech segmentation and recognition using directional motion history images and Zernike moments

Abstract: Appearance-based visual speech recognition using only video signals is presented. The proposed technique is based on the use of directional motion history images (DMHIs), which is an extension of the popular optical flow method for object tracking. Zernike moments of each DMHI are computed in order to perform the classification. The technique incorporates automatic temporal segmentation of isolated utterances. The segmentation of isolated utterance is achieved using pair-wise pixel comparison. Support vector m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 38 publications
0
4
0
Order By: Relevance
“…It can be found that the result obtained by the proposed approach is close to the ground truth and the segmentation errors are small. Apparently, the proposed approach outperforms the method [38] and is even promisingly comparable to the Method [37] and [39]. Importantly, the proposed approach just utilizes the extracted mouth area and does not compute the intensity change of every pixel frame by frame.…”
Section: ]mentioning
confidence: 99%
See 2 more Smart Citations
“…It can be found that the result obtained by the proposed approach is close to the ground truth and the segmentation errors are small. Apparently, the proposed approach outperforms the method [38] and is even promisingly comparable to the Method [37] and [39]. Importantly, the proposed approach just utilizes the extracted mouth area and does not compute the intensity change of every pixel frame by frame.…”
Section: ]mentioning
confidence: 99%
“…Talea et al [38] first obtained the mouth areas of the consecutive frames and then made a series of mouth area subtractions associated with a smoothing filtering for syllable separation. Recently, Shaikh et al [39] have utilized an ad hoc method for temporal viseme segmentation (i.e. 14 different mouth activities) based on the pair-wise pixel comparison of consecutive images.…”
Section: B Lip Motion Segmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…The result is a scalar-valued image compared with which moving pixels are brighter [15]. Examples of MHI are presented in Figure 7(b).…”
Section: Detecting and Tracking Pedestrianmentioning
confidence: 99%