2020
DOI: 10.48550/arxiv.2005.10696
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Novel Policy Seeking with Constrained Optimization

Abstract: In this work, we address the problem of seeking novel policies in reinforcement learning tasks. Instead of following the multi-objective framework commonly used in existing methods, we propose to rethink the problem under a novel perspective of constrained optimization. We at first introduce a new metric to evaluate the difference between policies, and then design two practical novel policy seeking methods following the new perspective, namely the Constrained Task Novel Bisector (CTNB), and the Interior Policy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 32 publications
0
8
0
Order By: Relevance
“…By contrast, our method utilizes a filtering-based objective via reward switching to strictly enforce all the diversity constraints. Sun et al (2020) adopts a conceptually similar objective by early terminating episodes that do not incur sufficient novelty. However, Sun et al (2020) does not leverage any exploration technique for those rejected samples and may easily suffer from low sample efficiency in challenging RL tasks we consider in this paper.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…By contrast, our method utilizes a filtering-based objective via reward switching to strictly enforce all the diversity constraints. Sun et al (2020) adopts a conceptually similar objective by early terminating episodes that do not incur sufficient novelty. However, Sun et al (2020) does not leverage any exploration technique for those rejected samples and may easily suffer from low sample efficiency in challenging RL tasks we consider in this paper.…”
Section: Related Workmentioning
confidence: 99%
“…Sun et al (2020) adopts a conceptually similar objective by early terminating episodes that do not incur sufficient novelty. However, Sun et al (2020) does not leverage any exploration technique for those rejected samples and may easily suffer from low sample efficiency in challenging RL tasks we consider in this paper. There is another concurrent work with an orthogonal focus, which directly optimizes diversity with reward constraints (Zahavy et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Recently a variety of DRL-based learning methods have been proposed to discover diverse control policies in machine learning, e.g., [Achiam et al 2018;Conti et al 2018;Eysenbach et al 2019;Haarnoja et al 2018;Hester and Stone 2017;Houthooft et al 2016;Schmidhuber 1991;Sharma et al 2019;Sun et al 2020;]. These methods mainly encourage exploration of unseen states or actions by jointly optimizing the task and novelty objectives , or optimizing intrinsic rewards such as heuristically defined curiosity terms [Eysenbach et al 2019;Sharma et al 2019].…”
Section: Diversity Optimizationmentioning
confidence: 99%
“…Our novel policy search is in principle similar to the idea of [Sun et al 2020;]. However, there are two key differences.…”
Section: Stage 2: Novel Policy Seekingmentioning
confidence: 99%
“…( 7)). Sun et al [38] also investigated CMDPs, but focused on the setup where the diversity reward has to satisfy a constraint, so the diversity reward is r e and the extrinsic reward is r d . But most importantly, we use a different method to solve CMDPs, which is based on Lagrange multipliers and SFs and is justified from CMDP theory [3,8,7], while these other two papers use techniques that are not guaranteed to solve CMDPs.…”
Section: Solving the Constrained Mdpmentioning
confidence: 99%