2020
DOI: 10.48550/arxiv.2006.01096
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

Anoopkumar Sonar,
Vincent Pacelli,
Anirudha Majumdar

Abstract: A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domain experienced during training. In this paper, we approach this challenge through the following invariance principle: an agent must find a representation such that there exists an action-predictor built on top of this representation that is simultaneously optimal across all training domains. Intuitively, the resulting invariant policy enhances generalization by finding causes of successful actions. W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 30 publications
0
9
0
Order By: Relevance
“…Several approaches use multiple contexts to learn an invariant representation, which is then assumed to generalise well to testing contexts. Sonar et al [140,IPO] apply ideas from Invariant Risk Minimization [141] to policy optimisation, learning a representation which enables jointly optimal action prediction across all domains, and show improved performance over PPO on several visual and dynamics variation environments. Bertrán et al [142,IAPE] introduce the Instance MDP, an alternative formalism for the generalisation problem, and then motivate theoretically an approach to learn a collection of policies on subsets of the training domains, such that the aggregate policy is invariant to any individual context-specific features which would not generalise.…”
Section: Much Work Draws On Causal Inference To Learn Invariant Featu...mentioning
confidence: 99%
“…Several approaches use multiple contexts to learn an invariant representation, which is then assumed to generalise well to testing contexts. Sonar et al [140,IPO] apply ideas from Invariant Risk Minimization [141] to policy optimisation, learning a representation which enables jointly optimal action prediction across all domains, and show improved performance over PPO on several visual and dynamics variation environments. Bertrán et al [142,IAPE] introduce the Instance MDP, an alternative formalism for the generalisation problem, and then motivate theoretically an approach to learn a collection of policies on subsets of the training domains, such that the aggregate policy is invariant to any individual context-specific features which would not generalise.…”
Section: Much Work Draws On Causal Inference To Learn Invariant Featu...mentioning
confidence: 99%
“…Due to the new concept and easy implementation, IRM has gain notable visibility recently. There are some further theoretical analyses on the success [132] and failure cases of IRM [133], and IRM has been extended to other tasks including text classification [134] and reinforcement learning [135]. The idea to pursue the invariance of optimal representationlevel classifier is also extended.…”
Section: Domain-invariant Representation-based Dgmentioning
confidence: 99%
“…We show how representations learned using PSM, when faced with semantically equivalent environments, can learn the main factors of variation and ignore spurious correlations that hinder generalization. We use LQR with distractors (Song et al, 2019;Sonar et al, 2020) to assess generalization in a feature-based RL setting with linear function approximation. The distractors are input features that are spuriously correlated with optimal actions and can be used for predicting these actions during training, but hurt generalization.…”
Section: Lqr With Spurious Correlationsmentioning
confidence: 99%