2018
DOI: 10.48550/arxiv.1805.01954
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Behavioral Cloning from Observation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
85
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 61 publications
(85 citation statements)
references
References 0 publications
0
85
0
Order By: Relevance
“…IfO [40] Learning policy from aligned observation only BCO [69] Adopting IfO setting and integrating with BC TCN [61] Multi-viewpoint self-supervised IfO method One-shot IfO [5] Extracting features from unlabeled and unaligned gameplay footage Zero-Shot Visual Imitation [51] Using distance between observations to predict and penalize the actions IfO survey [72] Detailed classified recent IfO methods Imitating Latent Policies from Observation [19] Infering latent policies directly from state observations GAIfO [70] Generative adversarial structure aggregating with IfO IfO Leveraging Proprioception [71] Leveraging internal information of the agent OPOLO [81] Using dual-form of the expectation function and adversarial structure to achieve off-policy IfO follows the nature of how human and animal imitate. For example, people learn to dance by following a video, this kind of following process is achieved though detecting the changes of poses and taking actions to match the pose, which is similar to how IfO solves the problem.…”
Section: Publication Descriptionmentioning
confidence: 99%
“…IfO [40] Learning policy from aligned observation only BCO [69] Adopting IfO setting and integrating with BC TCN [61] Multi-viewpoint self-supervised IfO method One-shot IfO [5] Extracting features from unlabeled and unaligned gameplay footage Zero-Shot Visual Imitation [51] Using distance between observations to predict and penalize the actions IfO survey [72] Detailed classified recent IfO methods Imitating Latent Policies from Observation [19] Infering latent policies directly from state observations GAIfO [70] Generative adversarial structure aggregating with IfO IfO Leveraging Proprioception [71] Leveraging internal information of the agent OPOLO [81] Using dual-form of the expectation function and adversarial structure to achieve off-policy IfO follows the nature of how human and animal imitate. For example, people learn to dance by following a video, this kind of following process is achieved though detecting the changes of poses and taking actions to match the pose, which is similar to how IfO solves the problem.…”
Section: Publication Descriptionmentioning
confidence: 99%
“…However, bandit algorithms assume that taking actions does not affect the state [17], while actually recommendations do have an effect on user behavior [26]; hence RL is a more suitable choice for the RS task. Another related field is imitation learning, where the policy is learned from expert demonstrations [8,9,23,34].…”
Section: Related Workmentioning
confidence: 99%
“…This formulation enables RL agents to best exploit large datasets to learn an agent that outperform the behavioral policy, since offline training is significantly cheaper compared to interacting with the environment in real-world settings. Despite the promise of the framework, current offline RL algorithms continue to under-achieve naive baselines such as Behavorial Cloning (Torabi et al, 2018), as reported by Fu et al (2020).…”
Section: Introductionmentioning
confidence: 99%