SIGGRAPH Asia 2022 Conference Papers 2022
DOI: 10.1145/3550469.3555392
|View full text |Cite
|
Sign up to set email alerts
|

CLIP-Mesh: Generating textured meshes from text using pretrained image-text models

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
31
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 114 publications
(31 citation statements)
references
References 9 publications
0
31
0
Order By: Relevance
“…More related to our work, CLIP-Mesh [20] and Dream Fields [16] do so by using the CLIP embedding and can condition 3D generation on text. Our model is built on the recent Dream Fusion approach [33], which builds on a similar idea using a diffusion model as prior.…”
Section: Related Workmentioning
confidence: 99%
“…More related to our work, CLIP-Mesh [20] and Dream Fields [16] do so by using the CLIP embedding and can condition 3D generation on text. Our model is built on the recent Dream Fusion approach [33], which builds on a similar idea using a diffusion model as prior.…”
Section: Related Workmentioning
confidence: 99%
“…3D Generation Guided by Text-Image Models: Another line of work leverages image-text models trained on massive datasets to provide supervision for 3D generation. For instance, methods like DreamField [13], CLIP-Mesh [14] and PureCLIPNeRF [17] learn to generate 3D models by optimizing the CLIP similarity scores between rendered images and the input text prompts. Instead of relying on CLIP which might be insufficient to capture high frequency details, more recent methods [26,18,34] use text-to-image generation models to provide alignment supervision between text prompts and rendered images, and generate more photo-realistic 3D models.…”
Section: Related Workmentioning
confidence: 99%
“…Neural fields are an intriguing way to fully generate 3D models because, unlike meshes, they don't depend on topological properties such as genus and subdivision [28,15]. Initial generative text-to-3D attempts with NeRF backbones took advantage of a robust language model [53] to align each rendered view on some textual condition [21,73].…”
Section: Related Workmentioning
confidence: 99%