2022
DOI: 10.1609/aaai.v36i1.19966
|View full text |Cite
|
Sign up to set email alerts
|

Flow-Based Unconstrained Lip to Speech Generation

Abstract: Unconstrained lip-to-speech aims to generate corresponding speeches based on silent facial videos with no restriction to head pose or vocabulary. It is desirable to generate intelligible and natural speech with a fast speed in unconstrained settings. Currently, to handle the more complicated scenarios, most existing methods adopt the autoregressive architecture, which is optimized with the MSE loss. Although these methods have achieved promising performance, they are prone to bring issues including high infe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
13
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(13 citation statements)
references
References 21 publications
0
13
0
Order By: Relevance
“…Griffin-Lim algorithm [11]). These works outperform previous works by a wide margin under unconstrained settings [15,24].…”
Section: Introductionmentioning
confidence: 79%
See 4 more Smart Citations
“…Griffin-Lim algorithm [11]). These works outperform previous works by a wide margin under unconstrained settings [15,24].…”
Section: Introductionmentioning
confidence: 79%
“…As this topic has only recently attracted attention of the researchers, there are not many works on it currently. Prajwal et al [24] firstly propose an autoregressive sequence-to-sequence model modified from Tacotron 2 [31] to tackle this problem, which generates mel-spectrograms conditioned on video frames; He et al [15] use a non-autoregressive architecture to accelerate inference and use a Glow [19] module for mel-spectrogram refinement.…”
Section: Unconstrained Lip-to-speech Synthesismentioning
confidence: 99%
See 3 more Smart Citations