2021 IEEE International Conference on Robotics and Automation (ICRA) 2021
DOI: 10.1109/icra48506.2021.9560895
|View full text |Cite
|
Sign up to set email alerts
|

Spatial Reasoning from Natural Language Instructions for Robot Manipulation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(10 citation statements)
references
References 21 publications
0
10
0
Order By: Relevance
“…In this paper, we focus on spatial reasoning over text which can be described as inferring the implicit 1 spatial relations from explicit relations 2 described in the text. Spatial reasoning plays a crucial role in diverse domains, including language grounding (Liu et al, 2022), navigation (Zhang et al, 2021), and human-robot interaction (Venkatesh et al, 2021). By studying this task, we can analyze both the reading comprehension and logical reasoning capabilities of models.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we focus on spatial reasoning over text which can be described as inferring the implicit 1 spatial relations from explicit relations 2 described in the text. Spatial reasoning plays a crucial role in diverse domains, including language grounding (Liu et al, 2022), navigation (Zhang et al, 2021), and human-robot interaction (Venkatesh et al, 2021). By studying this task, we can analyze both the reading comprehension and logical reasoning capabilities of models.…”
Section: Introductionmentioning
confidence: 99%
“…It is common that referring expressions contain relational concepts between multiple entities in the scene, and its exploitation has been shown to improve the capability of the models to comprehend those expressions (Zender et al, 2009;Nagaraja et al, 2016;Hu et al, 2017;Shridhar et al, 2020). In particular, these relationships tend to be spatial relations from the point of reference of the user and the robot must be able to cope with this kind of descriptions in order to resolve any ambiguities there might be to eventually identify the right entity in the scene (Ding et al, 2021;Venkatesh et al, 2021;Roh et al, 2022). Ding et al (2021) present a transformer-based architecture combining the language features with a visionguided attention framework to model the global context in a multi-modal fashion.…”
Section: Spatial Referring Expressionsmentioning
confidence: 99%
“…Here, an application's row embodies the principal modality, whereas the column encapsulates the ancillary modality. Vision Gesture interpretation for visual navigation in VR environment [78] Spacial reasoning for robot pickup manipulation in HRC [169] Human activity recognition for safe HRC [27] Hand gesture recognition for robotic control and navigation and HMI [137] Human activity recognition for HRI [95] Vision-and-voice navigation for autonomous agent interaction with human and environment [168] Contactless force feedback and gesture tracking for enhancing the accuracy and efficiency of humanrobot manipulation tasks [45] Human position estimation for safe HRC [94] 3D object detection in autonomous driving [194] MR bidirectional communication for HRI [82] Visuo-haptic guidance for mobile collaborative robotic assistant MOCA [173] Human activity recognition for HRC in the noisy environment [151] Audio-visual scene aware dialog for human-machine conversation [165] Human activity recognition for pHRI and HRC [150] Gesture recognition for HRI and social robot [130] Predicting interactions between objects and environment by tactile and visual feedback for intelligent robotics [171] Emotion recognition for HCI [92] Bi-directional navigation intent communication for safe HRI [83] Visual-inertial hand motion tracking for HRI and VR & AR application [72] Auditory and language…”
Section: Combination Of Two Types Of Modalitiesmentioning
confidence: 99%
“…g,h) LANG-UNet algorithm for spatial reasoning based on the fusion of text and vision modalities: (g) overall illustration of spatial reasoning task; (h) the LANG-UNet modal architecture. Reproduced with permission [169]. Copyright 2021, IEEE.…”
mentioning
confidence: 99%