Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475196
|View full text |Cite
|
Sign up to set email alerts
|

TACR-Net: Editing on Deep Video and Voice Portraits

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 35 publications
0
6
0
Order By: Relevance
“…Comparative study is done between the various deep learning algorithms such as TAC-NETS [43], RESNET-34 [44], MULTI-CNN…”
Section: Comparative Analysis and Discussionmentioning
confidence: 99%
“…Comparative study is done between the various deep learning algorithms such as TAC-NETS [43], RESNET-34 [44], MULTI-CNN…”
Section: Comparative Analysis and Discussionmentioning
confidence: 99%
“…As touched upon a bit earlier, audio-driven deep fakes can be categorised by whether they are generated by leveraging an audio driven structural representation of the face, or without. There have been numerous approaches over the years relating to the former, ranging from ones such as [2,7,10,16,19,31,39,56,66,68,74,75,79,91] which generate a set of 2D facial landmark co-ordinates from audio, or [8,15,32,37,52,62,63,69,76,77,[83][84][85]87] which predict expression parameters from audio to drive a 3D face model. What these approaches all have in common is that they use these intermediate structural representations as input to a separate neural rendering model which is typically trained as an image to image translation task to generate the final photo realistic image frame.…”
Section: Audio Driven Video Generationmentioning
confidence: 99%
“…• Talking-Head Video Synthesis. In talking-head video synthesis, some pipelines [74,38,85,46,67,47,72] for high-quality face synthesis usually extracts the 3D face parameters from the target face images through 3D face models [6,73,24], and generates the 3D face parameters from source speech or text, and then generates the face images from the generated 3D face parameters. • Image/Video/Sound Generation.…”
Section: Applications Of Regeneration Learningmentioning
confidence: 99%
“…• The source data X and target data Y have too much uncorrelated information (i.e., X ∩Y X ∪Y ), such as lyric/video and melody in conditional melody generation [39,81,15,92,19], speech and face images in talking-head video synthesis [74,38,85,46,67,47]. Directly learning the mapping between X and Y would lead to overfitting.…”
Section: Applications Of Regeneration Learningmentioning
confidence: 99%