2022
DOI: 10.48550/arxiv.2203.03580
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(20 citation statements)
references
References 0 publications
0
20
0
Order By: Relevance
“…They apply it to VirtualHome and BabyAI tasks, and find that the inclusion of the pretrained language model improves generalisation to novel tasks. Similarly, Parisi et al (2022) demonstrate that vision models pretrained with self-supervised learning, especially crop segmentations and momentum contrast , can be effectively incorporated into control policies.…”
Section: Related Workmentioning
confidence: 92%
“…They apply it to VirtualHome and BabyAI tasks, and find that the inclusion of the pretrained language model improves generalisation to novel tasks. Similarly, Parisi et al (2022) demonstrate that vision models pretrained with self-supervised learning, especially crop segmentations and momentum contrast , can be effectively incorporated into control policies.…”
Section: Related Workmentioning
confidence: 92%
“…Unsupervised representation learning for visual control Following the work of Jaderberg et al [64] that demonstrated the effectiveness of auxiliary unsupervised objectives for RL, a variety of unsupervised learning objectives have been studied, including future latent reconstruction [27,65,66,67], bisimulation [68,69], contrastive learning [70,71,72,73,74,30,75,29,76,77], world model learning [8,9,10,54] and reconstruction [78,79]. Recent approaches have also demonstrated that simple data augmentations can sometimes be effective even without such representation learning objectives [80,81,82].…”
Section: Discussionmentioning
confidence: 99%
“…Pretrained CLIP features have been used in a number of recent robotics papers to speed up control and navigation tasks. The features can condition the policy network [Khandelwal et al, 2021], or they can be fused throughout the visual encoder to integrate semantic information about the environment [Parisi et al, 2022]. Pretrained language models have also been shown to provide useful initializations from which to train policies to imitate offline trajectories [Reid et al, 2022, Li et al, 2022.…”
Section: Related Workmentioning
confidence: 99%
“…A comparison between pretrained vision-language representations and pretrained ImageNet representations would also be of interest. Because ImageNet representations are optimized for classification rather than encoding objects and relationship, we expect that ImageNet representations would perform more poorly, as shown by Du et al [2021] and Parisi et al [2022].…”
Section: Number Of Objects Held By Lse-ngu Variantsmentioning
confidence: 99%