Characteristics of the mouth shape in the production of Japanese - Stroboscopic observation.

Fukuda, Yumiko; Hiki, Shizuo

doi:10.1250/ast.3.75

Cited by 26 publications

(7 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, the viseme will be the best unit, instead of the phoneme, to represent the visual feature. From this viewpoint, the viseme was employed as a unit to represent the visual feature, referring to Fukuda [10], and the visual data was recognized as was done in Chapter 5 by visual HMM and the result was integrated with the audio result. The number of mixtures was set to 12 based on the best result using the viseme.…”

Section: Experiments With Visemementioning

confidence: 99%

Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature

Komai

Ariki

Takiguchi

2011

Advances in Image and Video Technology

View full text Add to dashboard Cite

Abstract. As one of the techniques for robust speech recognition under noisy environment, audio-visual speech recognition using lip dynamic visual information together with audio information is attracting attention and the research is advanced in recent years. Since visual information plays a great role in audio-visual speech recognition, what to select as the visual feature becomes a significant point. This paper proposes, for spoken word recognition, to utilize c combined parameter(combined parameter) as the visual feature extracted by Active Appearance Model applied to a face image including the lip area. Combined parameter contains information of the coordinate value and the intensity value as the visual feature. The recognition rate was improved by the proposed feature compared to the conventional features such as DCT and the principal component score. Finally, we integrated the phoneme score from audio information and the viseme score from visual information with high accuracy.

show abstract

Section: Experiments With Visemementioning

confidence: 99%

Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature

Komai

Ariki

Takiguchi

2011

Advances in Image and Video Technology

View full text Add to dashboard Cite

show abstract

“…Most studies have used only frontal face images in the image mode. From the frontal view, the shape of mouths for some phones are quite similar, and thus, it is difficult to discriminate them [5]. Some methods proposed the use of depth images of mouths.…”

Section: Introductionmentioning

confidence: 99%

Multimodal speech recognition using mouth images from depth camera

Yasui

Inoue

Iwano

et al. 2017

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

View full text Add to dashboard Cite

Deep learning has been proved to be effective in multimodal speech recognition using facial frontal images. In this paper, we propose a new deep learning method, a trimodal deep autoencoder, which uses not only audio signals and face images, but also depth images of faces, as the inputs. We collected continuous speech data from 20 speakers with Kinect 2.0 and used them for our evaluation. The experimental results with 10dB SNR showed that our method reduced errors by 30%, from 34.6% to 24.2% from audio-only speech recognition when SNR was 10dB. In particular, it is effective for recognizing some consonants including /k/, /t/.

show abstract

“…Among various movements of speech organs, changes in the shapes of mouth can be observed from the external configurations which play an important role regarding acoustic properties of the speech sound. Accordingly, it is expected that these visible observations will provide useful data for the study on articulatory behaviors in speech production [3] [4]. Lip shapes, one of the usual phonetic subject, have been one of stronger interest in the knowledge-based synthesis of person's facial expression.…”

Section: Introductionmentioning

confidence: 99%

The relationship between lip shapes and acoustical characteristics during speech

Mori

Sonoda

1996

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

A quantitative knowledge of the articulatory characteristics is necessary for understanding the dynamics of speech production. Accordingly, it is expected that observations of the shape of mouth will provide useful data for the study on articulatory behaviors in speech production. This paper describes characteristic changes in the shape of the mouth on the basis of processed image data taken by high-speed video recorder, and studies recognition tests jointly using articulatory behavior of lips and sound pattern during speech.

show abstract

Characteristics of the mouth shape in the production of Japanese - Stroboscopic observation.

Cited by 26 publications

References 1 publication

Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature

Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature

Multimodal speech recognition using mouth images from depth camera

The relationship between lip shapes and acoustical characteristics during speech

Contact Info

Product

Resources

About