2019
DOI: 10.1609/aaai.v33i01.3301126
|View full text |Cite
|
Sign up to set email alerts
|

Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences

Abstract: Jointly learning representations of 3D shapes and text is crucial to support tasks such as cross-modal retrieval or shape captioning. A recent method employs 3D voxels to represent 3D shapes, but this limits the approach to low resolutions due to the computational cost caused by the cubic complexity of 3D voxels. Hence the method suffers from a lack of detailed geometry. To resolve this issue, we propose Y 2 Seq2Seq, a view-based model, to learn cross-modal representations by joint reconstruction and predictio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
40
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 52 publications
(41 citation statements)
references
References 6 publications
1
40
0
Order By: Relevance
“…There are also methods jointly learning features from point clouds and multi-view projections [47]. It is also possible to treat point clouds and views as sequences [26,17,15], or to use unsupervised learning [16].…”
Section: Related Workmentioning
confidence: 99%
“…There are also methods jointly learning features from point clouds and multi-view projections [47]. It is also possible to treat point clouds and views as sequences [26,17,15], or to use unsupervised learning [16].…”
Section: Related Workmentioning
confidence: 99%
“…Deep learning models have led to significant progress in feature learning for 3D shapes [13,12,15,14,18,19,10,20,16,11]. Here, we focus on reviewing studies on point clouds.…”
Section: Related Workmentioning
confidence: 99%
“…3D shape retrieval aims to match the relevant 3D shapes, which can be described as 3D mesh [9], voxel grid [8], point cloud [13] or multi-view [1,14], when given a query. The query can be 3D shape or other data representation modalities, such as: text [6], 2D image [3,[15][16][17] and 2D sketch [18], and we unite them as cross-domain 3D shape retrieval. When the query is 3D shape, many deep networks have been proposed to discover the intrinsic characteristics of 3D data, including the multi-view context and geometric structure information.…”
Section: Related Work 21 3d Shape Retrievalmentioning
confidence: 99%
“…With widespread use of 3D equipments and software [1][2][3], 3D shape retrieval, the task of matching the object (such as 2D image [4], 3D shape [5] or text [6]) from the gallery 3D shape dataset, has drawn especial attention from multimedia and computer vision communities. Despite the deep learning technique has already achieved good performance in 3D shape retrieval, there still exist some issues to be solved , especially for 2D image-based 3D shape retrieval (2D-to-3D) task.…”
Section: Introductionmentioning
confidence: 99%