2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019
DOI: 10.1109/iros40897.2019.8968603
|View full text |Cite
|
Sign up to set email alerts
|

Grounding Language Attributes to Objects using Bayesian Eigenobjects

Abstract: We develop a system to disambiguate object instances within the same class based on simple physical descriptions. The system takes as input a natural language phrase and a depth image containing a segmented object and predicts how similar the observed object is to the object described by the phrase. Our system is designed to learn from only a small amount of human-labeled language data and generalize to viewpoints not represented in the language-annotated depth image training set. By decoupling 3D shape repres… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(14 citation statements)
references
References 24 publications
0
14
0
Order By: Relevance
“…In the robotics domain, Cohen et al [7] combine Bayesian Eigenobjects with a language grounding model that maps natural language phrases and segmented depth images to a shared space. This Bayesian Eigenobjects approach is however evaluated on only three classes of ob-Figure 1: A language + vision manifold alignment approach to language grounding.…”
Section: Related Workmentioning
confidence: 99%
“…In the robotics domain, Cohen et al [7] combine Bayesian Eigenobjects with a language grounding model that maps natural language phrases and segmented depth images to a shared space. This Bayesian Eigenobjects approach is however evaluated on only three classes of ob-Figure 1: A language + vision manifold alignment approach to language grounding.…”
Section: Related Workmentioning
confidence: 99%
“…Recent works in vision and language navigation [9] and object manipulation [10] can handle complex instructions using multi-modal information, but they still suffer from ambiguity and cascading errors due to misprediction. Although several works have specifically focused on the visual grounding of natural object descriptions [11,12,13], they do not tackle ambiguity and incompleteness using dialogue. Moreover, the predominant approach of end-to-end training for visual grounding is difficult to use in a dialogue system, because the generation of a question pertaining to the instruction, requires a finer-grained understanding of the scene.…”
Section: Related Workmentioning
confidence: 99%
“…Language Grounding in 3D Prior works have associated single-word attributes with 3D object models based on latent representations of 3D meshes [26]. To learn spatial language, vision-andlanguage navigation (VLN) [27] models infer navigation actions from instructions and visual observations in 3D simulated worlds.…”
Section: Related Workmentioning
confidence: 99%