2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2017
DOI: 10.1109/cvprw.2017.69
|View full text |Cite
|
Sign up to set email alerts
|

Time-Contrastive Networks: Self-Supervised Learning from Multi-view Observation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
91
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 103 publications
(91 citation statements)
references
References 0 publications
0
91
0
Order By: Relevance
“…as real-world tasks such as cooking or assembly are inherently long-horizon and hierarchical. Recent works have attempted learning from pixel space [11,27,35,42], but learning long-horizon tasks from video in a one-shot setting remains a challenge, since both the visual learning and task complexity exacerbate the demand for better data efficiency. Our solution explicitly models the compositionality in the task structure and policy, enabling us to scale oneshot visual imitation to complex tasks.…”
Section: Conjugate Task Graphmentioning
confidence: 99%
“…as real-world tasks such as cooking or assembly are inherently long-horizon and hierarchical. Recent works have attempted learning from pixel space [11,27,35,42], but learning long-horizon tasks from video in a one-shot setting remains a challenge, since both the visual learning and task complexity exacerbate the demand for better data efficiency. Our solution explicitly models the compositionality in the task structure and policy, enabling us to scale oneshot visual imitation to complex tasks.…”
Section: Conjugate Task Graphmentioning
confidence: 99%
“…We specifically approach this problem from the perspective of representation learning, using the learned embedding as a goal metric for reinforcement learning for reaching goal images. Prior works have aimed to learn representations for control through auto-encoding [27,48,15,16,34], pretrained supervised features [40], spatial structure [15,16,23], and viewpoint invariance [41]. However, unlike these works, we build a metric that specifically takes into account how actions lead to particular states, leading to control-centric representations that capture aspects of the observation that can be controlled, while discarding other elements.…”
Section: Related Workmentioning
confidence: 99%
“…An advantage of our modular design is that new advances relevant to the various stages of the pipeline can be readily incorporated to improve the overall effectiveness of the framework. However, an exciting direction for future work is to investigate methods for more end-to-end learning from visual demonstrations, for example taking inspiration from Sermanet et al [2017] and Yu et al [2018a], which may reduce the dependence on accurate pose estimators. Another exciting direction is to capitalize on our method's ability to learn from video clips and focus on large, outdoor activities, as well as motions of nonhuman animals that are conventionally very difficult, if not impossible, to mocap.…”
Section: Motion Completionmentioning
confidence: 99%