2023
DOI: 10.1609/aaai.v37i3.25453
|View full text |Cite
|
Sign up to set email alerts
|

Language-Assisted 3D Feature Learning for Semantic Scene Understanding

Abstract: Learning descriptive 3D features is crucial for understanding 3D scenes with diverse objects and complex structures. However, it is usually unknown whether important geometric attributes and scene context obtain enough emphasis in an end-to-end trained 3D scene understanding network. To guide 3D feature learning toward important geometric attributes and scene context, we explore the help of textual scene descriptions. Given some free-form descriptions paired with 3D scenes, we extract the knowledge regarding t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 43 publications
0
3
0
Order By: Relevance
“…Recent advances in foundation models of 2D vision and NLP have inspired the exploration of multi-modality methods in 3D models [12,[91][92][93][94][95][96][97][98]. For instance, Peng et al [97] proposed a zero-shot approach that co-embeds point features with images and text.…”
Section: Learning-based Segmentation With Multi-modalitymentioning
confidence: 99%
See 2 more Smart Citations
“…Recent advances in foundation models of 2D vision and NLP have inspired the exploration of multi-modality methods in 3D models [12,[91][92][93][94][95][96][97][98]. For instance, Peng et al [97] proposed a zero-shot approach that co-embeds point features with images and text.…”
Section: Learning-based Segmentation With Multi-modalitymentioning
confidence: 99%
“…Zeng et al [95] aligned 3D representations to open-world vocabularies via a cross-modal contrastive objective. Zhang et al [98] performed text-scene paired semantic understanding with language-assisted learning. How to facilitate and adapt multi-modalities with point clouds for better scene understanding is worth exploring.…”
Section: Learning-based Segmentation With Multi-modalitymentioning
confidence: 99%
See 1 more Smart Citation