2022
DOI: 10.48550/arxiv.2209.04145
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation

Abstract: Text-guided 3D shape generation remains challenging due to the absence of large paired text-shape dataset, the substantial semantic gap between these two modalities, and the structural complexity of 3D shapes. This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities and to eliminate the need for paired text-shape data. Our key contribution is a two-stage feature-space-alignment approach that maps CLIP feature… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…FID utilises Frechet distance between the two image distributions, and assumes that the two are Gaussian. This is applied to 2D assets [7], [64], [105], [106], [107], [108], [109], 3D assets via 3D classifiers [37], [110] or rasterisation to 2D form [111], [112].The Frechet point cloud distance extends FID for applications in assessing the similarity of pointbased 3D shapes [113]. This has been used to evaluate many deep-learning based 3D point-cloud [71], [113], [114] and mesh [67] generators.…”
Section: Perceptual Similarity Metricsmentioning
confidence: 99%
“…FID utilises Frechet distance between the two image distributions, and assumes that the two are Gaussian. This is applied to 2D assets [7], [64], [105], [106], [107], [108], [109], 3D assets via 3D classifiers [37], [110] or rasterisation to 2D form [111], [112].The Frechet point cloud distance extends FID for applications in assessing the similarity of pointbased 3D shapes [113]. This has been used to evaluate many deep-learning based 3D point-cloud [71], [113], [114] and mesh [67] generators.…”
Section: Perceptual Similarity Metricsmentioning
confidence: 99%
“…Some notable examples of such methods include CLIP-NeRF [58], PureCLIPNeRF [24], and Dream-Fields [19]. Additionally, recent studies have explored the fusion of CLIP with other algorithms, such as ISS [27] with SVR [37], CLIP-Forge [50] using a normalizing flow network [10], and AvatarCLIP [15] leveraging SMLP [30]. Furthermore, the diffusion model [48] has recently demonstrated impressive results in text-to-image generation, leading to its integration into the text-to-3D generation process.…”
Section: Text-to-3d Manipulation/generationmentioning
confidence: 99%