2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01036
|View full text |Cite
|
Sign up to set email alerts
|

Diffusion Autoencoders: Toward a Meaningful and Decodable Representation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
85
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 163 publications
(113 citation statements)
references
References 9 publications
0
85
0
Order By: Relevance
“…In this setting, we shift and scale hidden states of the UNet with the information not only from the time encoding but the audio embedding as well. We found this approach works better in comparison to other conditioning methods, such as using just an additional scale on top of Equation ( 14) [25], and applying a multihead attention mechanism with queries being a function of the audio embedding [30].…”
Section: Speech Conditioningmentioning
confidence: 92%
“…In this setting, we shift and scale hidden states of the UNet with the information not only from the time encoding but the audio embedding as well. We found this approach works better in comparison to other conditioning methods, such as using just an additional scale on top of Equation ( 14) [25], and applying a multihead attention mechanism with queries being a function of the audio embedding [30].…”
Section: Speech Conditioningmentioning
confidence: 92%
“…Diffusion autoencoders were first introduced by Preechakul et al (2022), as a way to condition the diffusion process on a compressed latent vector of the input itself. Diffusion can act as a more powerful generative decoder, and hence the input can be reduced to latents with higher compression ratios.…”
Section: Diffusion Magnitude-autoencoding (Dmae)mentioning
confidence: 99%
“…Denoising diffusion models [60,64] have seen great success on a wide variety of different challenges, ranging from image2image translation tasks like inpainting, colorisation, image upscaling, uncropping [6,26,41,42,50,53,57,59], audio generation [11,28,33,35,38,48,67,80], text-based image generation [4,21,23,46,51,55,58], video generation [24,27,82,86], and many others. For a thorough review on diffusion models and all of their recent applications, we recommend [81].…”
Section: Diffusion Modelsmentioning
confidence: 99%