2020
DOI: 10.1299/jsmermd.2020.2a1-l11
|View full text |Cite
|
Sign up to set email alerts
|

A Study On Accelerating Adversarial Imitation Learning By Behavioral Cloning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…An alternating interaction between weight estimation and GAIL train-ing therefore holds. There are also some researches (Sasaki and Yamashina 2021;Kim et al 2021;Xu et al 2022;Liu et al 2022) on addressing imperfect demonstrations issue in offline imitation learning. BCND (Sasaki and Yamashina 2021) is a weighted behavioral cloning method, with action distribution of learned policy as confidence.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…An alternating interaction between weight estimation and GAIL train-ing therefore holds. There are also some researches (Sasaki and Yamashina 2021;Kim et al 2021;Xu et al 2022;Liu et al 2022) on addressing imperfect demonstrations issue in offline imitation learning. BCND (Sasaki and Yamashina 2021) is a weighted behavioral cloning method, with action distribution of learned policy as confidence.…”
Section: Related Workmentioning
confidence: 99%
“…The latter combines these two steps in a single objective function instead. WGAIL successfully connects confidence estimation to the discriminator in GAIL, and BCND (Sasaki and Yamashina 2021) demonstrates confidence can be derived by the agent policy itself. Therefore, these two methods relax the assumption on the labeled confidence and can be conducted without exposure to prior information.…”
Section: Introductionmentioning
confidence: 99%
“…An alternating interaction between weight estimation and GAIL training therefore holds. There are also some researches (Sasaki and Yamashina 2021;Kim et al 2021;Xu et al 2022;Liu et al 2022) on addressing imperfect demonstrations issue in offline imitation learning. BCND (Sasaki and Yamashina 2021) is a weighted behavioral cloning method, with action distribution of learned policy as confidence.…”
Section: Related Workmentioning
confidence: 99%
“…Yu et al ( 2021) uses Q-functions to filter which data should be shared between tasks in a multi-task setting. In the imitation learning setting, Nair et al (2018) and Sasaki & Yamashina (2020) use Q-functions to filter out low-quality demonstrations, so they are not used for training. In both cases, the Q-function is used to evaluate some data that can be used for training.…”
Section: Using Q-functions As Filtersmentioning
confidence: 99%
“…The Q-function, Q i (s, a), of Task i estimates the expected discounted return of the policy after taking action a at state s (Watkins & Dayan, 1992). Although this is an estimate acquired during training, it is a critical component in many state-of-the-art RL algorithms (Haarnoja et al, 2018;Lillicrap et al, 2015) and has been used to filter for high-quality data in multi-task (Yu et al, 2021) and imitation learning settings (Nair et al, 2018;Sasaki & Yamashina, 2020), which suggests the Q-function is still very effective for evaluating and comparing actions during training. Unlike single-task RL, we use the Q-function as a switch that rates action proposals from other tasks' policies for the current task's state s. This simple and intuitive function is state and task-dependent, gives the current best estimate of which behaviors are most helpful, and is quickly adaptive to changes in its own and other policies during online learning.…”
Section: Q-switchmentioning
confidence: 99%