2022
DOI: 10.48550/arxiv.2212.09478
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…Our model outperforms alternatives in all possible scenarios, even those that are notoriously difficult because the modalities might be only loosely correlated. We note that recent works have explored the joint generation of multiple modalities [33,34]; however, such approaches are applicationspecific, e.g., text-to-image, and essentially only target two modalities. When relevant, we compare our method to additional recent alternatives to multimodal diffusion [35,36] and show the superior performance of MLD.…”
Section: Introductionmentioning
confidence: 99%
“…Our model outperforms alternatives in all possible scenarios, even those that are notoriously difficult because the modalities might be only loosely correlated. We note that recent works have explored the joint generation of multiple modalities [33,34]; however, such approaches are applicationspecific, e.g., text-to-image, and essentially only target two modalities. When relevant, we compare our method to additional recent alternatives to multimodal diffusion [35,36] and show the superior performance of MLD.…”
Section: Introductionmentioning
confidence: 99%