2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01121
|View full text |Cite
|
Sign up to set email alerts
|

Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning

Abstract: In this work, we tackle the problem of single imagebased 3D shape retrieval (IBSR), where we seek to find the most matched shape of a given single 2D image from a shape repository. Most of the existing works learn to embed 2D images and 3D shapes into a common feature space and perform metric learning using a triplet loss. Inspired by the great success in recent contrastive learning works on self-supervised representation learning, we propose a novel IBSR pipeline leveraging contrastive learning. We note that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(5 citation statements)
references
References 47 publications
(66 reference statements)
0
4
0
Order By: Relevance
“…Datasets. To validate our proposed method, we perform experiments on three datasets: ModelNet40 [36], MI3DOR [43], Pix3D [24], and Pix3D with four categories (Pix3D-4, a subset of the Pix3D dataset created by [13]). The ModelNet40 dataset is a 3D object benchmark and contains…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Datasets. To validate our proposed method, we perform experiments on three datasets: ModelNet40 [36], MI3DOR [43], Pix3D [24], and Pix3D with four categories (Pix3D-4, a subset of the Pix3D dataset created by [13]). The ModelNet40 dataset is a 3D object benchmark and contains…”
Section: Methodsmentioning
confidence: 99%
“…The advantage of using a set of 2D view representations is that they can directly employ the existing powerful CNNs for feature extraction [16,23] and reduce the domain gap between 3D models and images. Lin et al [13] used contrastive learning to realize instance-level 3D shape retrieval based on a single image. So far, great progress has been made in IBSR tasks.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Contrastive learning has shown its powerful ability in selfsupervised training (Chen et al 2020;He et al 2020;Oord, Li, and Vinyals 2018), which facilitates representation learning by pulling features that are semantically similar and pushing away features that are semantically different. Recently, contrastive learning has been explored under several multi-modal learning scenarios, including vision-language (Radford et al 2021;Wen et al 2021;Yuan et al 2021;Zhang et al 2021;Bakkali et al 2022), video-text (Yang, Bisk, andZolfaghari et al 2021), image-point cloud (Lin et al 2021;Afham et al 2022;Liu et al 2021b) etc. By aligning multi-modal features, it takes advantage of modality complementary to improve model performance.…”
Section: Contrastive Learningmentioning
confidence: 99%
“…Consequently, different types of approaches have been proposed to handle this problem [1], [2], [3], [4], and several works have attempted to recover 3D information from 2D images (rendered view [5], [6], [7], [8], [9], scene [10], [11], [12], sketch [13], [14], [15]). In addition, some cross-modal 3D retrieval methods [16], [17], [18] are used to search and match the 3D models in databases, which reduces the difficulty of acquiring models, but still falls short of human expectations in terms of the accuracy and matching requirements.…”
Section: Introductionmentioning
confidence: 99%