The 14th International Conference on Auditory-Visual Speech Processing 2017
DOI: 10.21437/avsp.2017-7
|View full text |Cite
|
Sign up to set email alerts
|

Using deep neural networks to estimate tongue movements from speech face motion

Abstract: This study concludes a tripartite investigation into the indirect visibility of the moving tongue in human speech as reflected in co-occurring changes of the facial surface. We were in particular interested in how the shared information is distributed over the range of contributing frequencies. In the current study we examine the degree to which tongue movements during speech can be reliably estimated from face motion using artificial neural networks. We simultaneously acquired data for both movement types; to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…As aforementioned, these imaging modalities are incontrovertibly statistically related each other. In [21], the authors explored the use of deep neural networks to estimate the tongue's motion from the face pictures. In this paper, we follow the task defined in [39] and we aim to picture tongue's motion from the lip images, leveraging the ultrasound tongue imaging.…”
Section: Speech Production Studymentioning
confidence: 99%
See 1 more Smart Citation
“…As aforementioned, these imaging modalities are incontrovertibly statistically related each other. In [21], the authors explored the use of deep neural networks to estimate the tongue's motion from the face pictures. In this paper, we follow the task defined in [39] and we aim to picture tongue's motion from the lip images, leveraging the ultrasound tongue imaging.…”
Section: Speech Production Studymentioning
confidence: 99%
“…These imaging modalities are incontrovertibly statistically related each other, such as recent studies suggest that one can picture the corresponding tongue motion from their voice and vice versa [26]. Here, we would like to ask a related question: given an observable image sequences of lips, can we predict the motion of the tongue [21]. The authors in [39] demonstrated that deep learning model [12,20] can reconstruct the tongue's motion from the lip images with satisfactory performance.…”
Section: Introductionmentioning
confidence: 99%