CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

Sanghi, Aditya; Chu, Hsin‐Sen; Lambourne, Joseph G.; Wang, Ye; Cheng, Chin-Yi; Fumero, Marco; Malekshan, Kamal Rahimi

doi:10.48550/arxiv.2110.02624

Cited by 9 publications

(11 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Closer to our method are works that utilize the richness of CLIP outside the imagery domain. In the 3D domain, CLIP's latent space provides a useful objective that enables semantic manipulation [Sanghi et al 2021;Michel et al 2021;Wang et al 2021a] where the domain gap is closed by a neural rendering. CLIP is even adopted in temporal domains [Guzhov et al 2021;Luo et al 2021;Fang et al 2021] that utilize large datasets of video sequences that are paired with text and audio.…”

Section: Clip Aided Methodsmentioning

confidence: 99%

MotionCLIP: Exposing Human Motion Generation to CLIP Space

Tevet¹,

Gordon²,

Hertz³

et al. 2022

Preprint

View full text Add to dashboard Cite

Section: Clip Aided Methodsmentioning

confidence: 99%

MotionCLIP: Exposing Human Motion Generation to CLIP Space

Tevet¹,

Gordon²,

Hertz³

et al. 2022

Preprint

View full text Add to dashboard Cite

“…Benefiting from the zero-shot ability of CLIP, many amazing zero-shot text-driven applications [Frans et al 2021;Patashnik et al 2021;] are being developed. Combining CLIP with 3D representations like NeRF or mesh, zero-shot text-driven 3D object generation [Jain et al 2021a;Jetchev 2021;Michel et al 2021;Sanghi et al 2021] and manipulation [Wang et al 2021a] have also come true in recent months.…”

Section: Related Workmentioning

confidence: 99%

“…To illustrate the difficulty of direct optimization, we set two baselines that directly optimize on SMPL parameter 𝜃 and latent code 𝑧 in VPoser. Moreover, inspired by CLIP-Forge [Sanghi et al 2021], we use Real NVP [Dinh et al 2016] to get a bi-projection between the normal distribution and latent space distribution of VPoser. This normalization flow network is conditioned on the CLIP features.…”

Section: Baselinesmentioning

confidence: 99%

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

Hong¹,

Zhang²,

Peng³

et al. 2022

Preprint

View full text Add to dashboard Cite

texture generation. Moreover, by leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar. Extensive qualitative and quantitative experiments validate the effectiveness and generalizability of AvatarCLIP on a wide range of avatars. Remarkably, AvatarCLIP can generate unseen 3D avatars with novel animations, achieving superior zero-shot capability. Codes are available at https://github.com/hongfz16/AvatarCLIP.

show abstract

“…Concurrent work uses CLIP to fine-tune a pre-trained StyleGAN [11], and for image stylization [6]. Another concurrent work uses the ShapeNet dataset [5] and CLIP to perform unconditional 3D voxel generation [48]. The above techniques leverage a pre-trained generative network or a dataset to avoid the degenerate solutions common when using CLIP for synthesis.…”

Section: Related Workmentioning

confidence: 99%