Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning

Lin, Ming-Xian; Yang, Jie; He, Wenbo; Lai, Yu‐Kun; Jia, Rongfei; Zhao, Binqiang; Gao, Lin

doi:10.1109/iccv48922.2021.01121

Cited by 29 publications

(5 citation statements)

References 47 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Datasets. To validate our proposed method, we perform experiments on three datasets: ModelNet40 [36], MI3DOR [43], Pix3D [24], and Pix3D with four categories (Pix3D-4, a subset of the Pix3D dataset created by [13]). The ModelNet40 dataset is a 3D object benchmark and contains…”

Section: Methodsmentioning

confidence: 99%

“…The advantage of using a set of 2D view representations is that they can directly employ the existing powerful CNNs for feature extraction [16,23] and reduce the domain gap between 3D models and images. Lin et al [13] used contrastive learning to realize instance-level 3D shape retrieval based on a single image. So far, great progress has been made in IBSR tasks.…”

Section: Related Workmentioning

confidence: 99%

“…The common embedding space aims to minimize the intra-class distance within the same category and maximize the inter-class distance between different categories. Recently, researchers have employed contrastive loss [9,13], cross-modal center loss [11], softmax cross-entropy loss [12], and other contrastive learning and metric learning methods to learn the common embedding space that characterizes multi-modal data features. To minimize the intra-class distance, researchers use the cross-modal center and cross-entropy loss to learn a common embedding space for all modality data by calculating the class center (mean of the feature) for different classes [11].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A systematic investigation on the surface properties of Ti2AlC via first-principles calculations

Liu

Hou

et al. 2023

Surface Science

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A systematic investigation on the surface properties of Ti2AlC via first-principles calculations

Liu

Hou

et al. 2023

Surface Science

View full text Add to dashboard Cite

“…Contrastive learning has shown its powerful ability in selfsupervised training (Chen et al 2020;He et al 2020;Oord, Li, and Vinyals 2018), which facilitates representation learning by pulling features that are semantically similar and pushing away features that are semantically different. Recently, contrastive learning has been explored under several multi-modal learning scenarios, including vision-language (Radford et al 2021;Wen et al 2021;Yuan et al 2021;Zhang et al 2021;Bakkali et al 2022), video-text (Yang, Bisk, andZolfaghari et al 2021), image-point cloud (Lin et al 2021;Afham et al 2022;Liu et al 2021b) etc. By aligning multi-modal features, it takes advantage of modality complementary to improve model performance.…”

Section: Contrastive Learningmentioning

confidence: 99%

Cross-Modal Contrastive Learning for Domain Adaptation in 3D Semantic Segmentation

Xing

Ying

Wang

et al. 2023

AAAI

View full text Add to dashboard Cite

Domain adaptation for 3D point cloud has attracted a lot of interest since it can avoid the time-consuming labeling process of 3D data to some extent. A recent work named xMUDA leveraged multi-modal data to domain adaptation task of 3D semantic segmentation by mimicking the predictions between 2D and 3D modalities, and outperformed the previous single modality methods only using point clouds. Based on it, in this paper, we propose a novel cross-modal contrastive learning scheme to further improve the adaptation effects. By employing constraints from the correspondences between 2D pixel features and 3D point features, our method not only facilitates interaction between the two different modalities, but also boosts feature representations in both labeled source domain and unlabeled target domain. Meanwhile, to sufficiently utilize 2D context information for domain adaptation through cross-modal learning, we introduce a neighborhood feature aggregation module to enhance pixel features. The module employs neighborhood attention to aggregate nearby pixels in the 2D image, which relieves the mismatching between the two different modalities, arising from projecting relative sparse point cloud to dense image pixels. We evaluate our method on three unsupervised domain adaptation scenarios, including country-to-country, day-to-night, and dataset-to-dataset. Experimental results show that our approach outperforms existing methods, which demonstrates the effectiveness of the proposed method.

show abstract

“…Consequently, different types of approaches have been proposed to handle this problem [1], [2], [3], [4], and several works have attempted to recover 3D information from 2D images (rendered view [5], [6], [7], [8], [9], scene [10], [11], [12], sketch [13], [14], [15]). In addition, some cross-modal 3D retrieval methods [16], [17], [18] are used to search and match the 3D models in databases, which reduces the difficulty of acquiring models, but still falls short of human expectations in terms of the accuracy and matching requirements.…”

Section: Introductionmentioning

confidence: 99%

3D Model Retrieval Based on a 3D Shape Knowledge Graph

et al. 2020

View full text Add to dashboard Cite

A development of 3D construction technology has led to 3D models being applied in many fields, and the number of 3D models has exploded in recent years. Thus, 3D model retrieval has become a popular topic with many proposed approaches. However, all of the methods focus on the 3D model's global structural descriptor design based on various deep learning networks and ignore the local structural information of the 3D model and the correlation of the local structures. In this paper, we propose a novel 3D model retrieval method based on a 3D shape knowledge graph. We first introduce the concept of a geometric word that can be utilized to assemble other 3D model. Second, we construct a 3D shape knowledge graph based on the geometric words, models and their relations. Additionally, we propose a novel graph embedding method to generate embeddings of nodes. Finally, an effective multiple entities' retrieval method is used to handle the 3D model retrieval problem. More specifically, the 3D shape knowledge graph retains the basic structural information and saves these as a set of triples. Any 3D model can find its geometric words in a rich enough knowledge graph. It is reasonable that our approach can solve the cross-domain model retrieval problem. Our approach focuses on the structural information of 3D model and is not restricted by the database. We evaluate the proposed method on the ModelNet40 dataset for the 3D model retrieval task. Meanwhile, we also utilize the ShapeNet dataset to evaluate the performance of cross-domain retrieval task. Experimental results and comparisons with state-of-the-art methods demonstrate that our framework can achieve superior performance.

show abstract

Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning

Cited by 29 publications

References 47 publications

A systematic investigation on the surface properties of Ti2AlC via first-principles calculations

A systematic investigation on the surface properties of Ti2AlC via first-principles calculations

Cross-Modal Contrastive Learning for Domain Adaptation in 3D Semantic Segmentation

3D Model Retrieval Based on a 3D Shape Knowledge Graph

Contact Info

Product

Resources

About