2021
DOI: 10.48550/arxiv.2110.02624
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 0 publications
0
11
0
Order By: Relevance
“…Closer to our method are works that utilize the richness of CLIP outside the imagery domain. In the 3D domain, CLIP's latent space provides a useful objective that enables semantic manipulation [Sanghi et al 2021;Michel et al 2021;Wang et al 2021a] where the domain gap is closed by a neural rendering. CLIP is even adopted in temporal domains [Guzhov et al 2021;Luo et al 2021;Fang et al 2021] that utilize large datasets of video sequences that are paired with text and audio.…”
Section: Clip Aided Methodsmentioning
confidence: 99%
“…Closer to our method are works that utilize the richness of CLIP outside the imagery domain. In the 3D domain, CLIP's latent space provides a useful objective that enables semantic manipulation [Sanghi et al 2021;Michel et al 2021;Wang et al 2021a] where the domain gap is closed by a neural rendering. CLIP is even adopted in temporal domains [Guzhov et al 2021;Luo et al 2021;Fang et al 2021] that utilize large datasets of video sequences that are paired with text and audio.…”
Section: Clip Aided Methodsmentioning
confidence: 99%
“…Benefiting from the zero-shot ability of CLIP, many amazing zero-shot text-driven applications [Frans et al 2021;Patashnik et al 2021;] are being developed. Combining CLIP with 3D representations like NeRF or mesh, zero-shot text-driven 3D object generation [Jain et al 2021a;Jetchev 2021;Michel et al 2021;Sanghi et al 2021] and manipulation [Wang et al 2021a] have also come true in recent months.…”
Section: Related Workmentioning
confidence: 99%
“…To illustrate the difficulty of direct optimization, we set two baselines that directly optimize on SMPL parameter 𝜃 and latent code 𝑧 in VPoser. Moreover, inspired by CLIP-Forge [Sanghi et al 2021], we use Real NVP [Dinh et al 2016] to get a bi-projection between the normal distribution and latent space distribution of VPoser. This normalization flow network is conditioned on the CLIP features.…”
Section: Baselinesmentioning
confidence: 99%
“…Concurrent work uses CLIP to fine-tune a pre-trained StyleGAN [11], and for image stylization [6]. Another concurrent work uses the ShapeNet dataset [5] and CLIP to perform unconditional 3D voxel generation [48]. The above techniques leverage a pre-trained generative network or a dataset to avoid the degenerate solutions common when using CLIP for synthesis.…”
Section: Related Workmentioning
confidence: 99%