2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00836
|View full text |Cite
|
Sign up to set email alerts
|

PointCLIP: Point Cloud Understanding by CLIP

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
67
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 212 publications
(67 citation statements)
references
References 34 publications
0
67
0
Order By: Relevance
“…The results are shown in Table 3. Following PointCLIP (Zhang et al, 2022c), We use prompt templates with the category label to generate text features. From Table 3, it can be seen that our RECON surpasses all the zero-shot methods with CNN-based or Transformer-based backbones.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The results are shown in Table 3. Following PointCLIP (Zhang et al, 2022c), We use prompt templates with the category label to generate text features. From Table 3, it can be seen that our RECON surpasses all the zero-shot methods with CNN-based or Transformer-based backbones.…”
Section: Methodsmentioning
confidence: 99%
“…Cross-Point (Afham et al, 2022) uses both inter and intra-modal contrastive learning. PointCLIP (Zhang et al, 2022c) realizes image-point alignment by projecting point clouds to 2D depth images. Different from these methods, our work focuses on cross-modal contrastive learning by global feature alignment like Radford et al (2021), which is guided by generative modeling (Pang et al, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…With a high-capacity text encoder and visual encoder, CLIP (Radford et al, 2021) aligns of text and visual elements in heterogeneous embedding space. Several studies extend the applicability of CLIP to other tasks such as object detection (Gu et al, 2021), semantic segmentation (Wang et al, 2022;Li et al, 2022;Zhang et al, 2022) and image editing (Patashnik et al, 2021;Gal et al, 2021). Among them, PointClip (Zhang et al, 2022) is most related to our work.…”
Section: Related Workmentioning
confidence: 95%
“…Several studies extend the applicability of CLIP to other tasks such as object detection (Gu et al, 2021), semantic segmentation (Wang et al, 2022;Li et al, 2022;Zhang et al, 2022) and image editing (Patashnik et al, 2021;Gal et al, 2021). Among them, PointClip (Zhang et al, 2022) is most related to our work. It learns transferable visual concepts by leveraging CLIP's pre-trained knowledge for zero-shot point cloud recognition.…”
Section: Related Workmentioning
confidence: 95%
“…The seminal work CLIP [66] learns a joint language-vision embedding using more than 400 billion text-image pairs. The learned representation is semantically meaningful and expressive, thus has been adapted to various downstream tasks [82,71,49,79,65]. In this work, we adopt CLIP in the retrieval process.…”
Section: Related Workmentioning
confidence: 99%