2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019
DOI: 10.1109/iros40897.2019.8968165
|View full text |Cite
|
Sign up to set email alerts
|

EARLY FUSION for Goal Directed Robotic Vision

Abstract: Building perceptual systems for robotics which perform well under tight computational budgets requires novel architectures which rethink the traditional computer vision pipeline. Modern vision architectures require the agent to build a summary representation of the entire scene, even if most of the input is irrelevant to the agent's current goal. In this work, we flip this paradigm, by introducing EARLYFUSION vision models that condition on a goal to build custom representations for downstream tasks. We show t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 24 publications
1
3
0
Order By: Relevance
“…In our experiments, we also find that a naive late fusion architecture minus the optical flow yields poor results in RL settings (see Section 5.2). This observation is consistent with recent findings in related domains like visual navigation (Walsman et al, 2019).…”
Section: Introductionsupporting
confidence: 94%
“…In our experiments, we also find that a naive late fusion architecture minus the optical flow yields poor results in RL settings (see Section 5.2). This observation is consistent with recent findings in related domains like visual navigation (Walsman et al, 2019).…”
Section: Introductionsupporting
confidence: 94%
“…Another issue is the investigation of the network architecture for handling an action or language-based instruction beyond an input image. Since the image and another input are used to control the camera agent, designing an effective fusion model is an important problem [20], [42], [43]. An early fusion approach for goal directed navigation fuses the goal information with the input state followed by a convolution process to generate the feature for navigation [42].…”
Section: A Visual Navigation With and Without External Informationmentioning
confidence: 99%
“…Since the image and another input are used to control the camera agent, designing an effective fusion model is an important problem [20], [42], [43]. An early fusion approach for goal directed navigation fuses the goal information with the input state followed by a convolution process to generate the feature for navigation [42]. Alternatively, a gated attention architecture fuses language-based instruction with the input state [43].…”
Section: A Visual Navigation With and Without External Informationmentioning
confidence: 99%
“…Concatenation is the most popular method for performing early fusion, and is also known as additive methods [353]. Examples include the studies of Liu et al [370], Walsman et al [371], Petscharnig et al [372], and Sindagi et al [373]. A multiplication approach takes the product of the properties, as in the work of Zadeh et al [374] for fusing audio and video, or in the study of Fukui et al [375] for fusing visual and text properties through bilinear pooling.…”
Section: Multimodalitymentioning
confidence: 99%