2021
DOI: 10.48550/arxiv.2106.15587
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generalization of Reinforcement Learning with Policy-Aware Adversarial Data Augmentation

Abstract: The generalization gap in reinforcement learning (RL) has been a significant obstacle that prevents the RL agent from learning general skills and adapting to varying environments. Increasing the generalization capacity of the RL systems can significantly improve their performance on real-world working environments. In this work, we propose a novel policy-aware adversarial data augmentation method to augment the standard policy learning method with automatically generated trajectory data. Different from the com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 26 publications
0
4
0
Order By: Relevance
“…Zhou et al (2021, MixStyle) mixes style statistics across spatial dimensions in CNNs for increased data diversity. All these methods Zhang & Guo, 2021;Lee et al, 2020;Zhou et al, 2021) show improved performance on CoinRun (Cobbe et al, 2019) or OpenAI Procgen (Cobbe et al, 2020a) by improving both training and testing performance, and some also show gains on other benchmarks such as visually distracting DeepMind Control (DMC) variants. Hansen and Wang (2021, SODA) uses similar augmentations as before but only to learn a more robust image encoder, while the policy is trained on non-augmented data, demonstrating good performance on DMC-GB (Hansen & Wang, 2021).…”
Section: Data Augmentation and Domain Randomisationmentioning
confidence: 99%
“…Zhou et al (2021, MixStyle) mixes style statistics across spatial dimensions in CNNs for increased data diversity. All these methods Zhang & Guo, 2021;Lee et al, 2020;Zhou et al, 2021) show improved performance on CoinRun (Cobbe et al, 2019) or OpenAI Procgen (Cobbe et al, 2020a) by improving both training and testing performance, and some also show gains on other benchmarks such as visually distracting DeepMind Control (DMC) variants. Hansen and Wang (2021, SODA) uses similar augmentations as before but only to learn a more robust image encoder, while the policy is trained on non-augmented data, demonstrating good performance on DMC-GB (Hansen & Wang, 2021).…”
Section: Data Augmentation and Domain Randomisationmentioning
confidence: 99%
“…In this case, the authors decouple the augmentation process from policy learning in order to benefit from data augmentation without introducing further complexity. A methodology for generating adversarial trajectories, which are combined with the original ones during training, to enhance generalization is described in [8]. In a similar manner, augmented and nonaugmented data are used concurrently to jointly optimize the state-action value function based on a redefined objective [29].…”
Section: Related Workmentioning
confidence: 99%
“…In the context of reinforcement learning (RL), the generalization and convergence speed of the agent is one of the main subjects of research, as many approaches struggle to efficiently utilize the available data [8,9], especially in environments with sparse rewards. Thus, several attempts have been made to augment the datasets used for the agent's training, achieving quite satisfactory results and indicating that data augmentation is a promising direction in RL [10][11][12][13].…”
Section: Introductionmentioning
confidence: 99%
“…End-to-end training of deep models with RL objectives appears has been shown prone to overfitting from spurious features only relevant in the observed transitions (Song et al, 2019;Bertran et al, 2020). To address this, prior work considered different data augmentation strategies (Laskin et al, 2020b;Yarats et al, 2021a;Cobbe et al, 2019), and online adaption methods on top to alleviate engineering burdens (Zhang & Guo, 2021;Raileanu et al, 2020). Alternative approaches have been considering problem-specific properties of the environment (Zhang et al, 2020;Raileanu & Fergus, 2021), auxiliary losses (Laskin et al, 2020a;Schwarzer et al, 2020), and frozen pre-trained layers (Yarats et al, 2021b;Stooke et al, 2021).…”
Section: Related Workmentioning
confidence: 99%