Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/376
|View full text |Cite
|
Sign up to set email alerts
|

Only Relevant Information Matters: Filtering Out Noisy Samples To Boost RL

Abstract: In reinforcement learning, policy gradient algorithms optimize the policy directly and rely on sampling efficiently an environment. Nevertheless, while most sampling procedures are based on direct policy sampling, self-performance measures could be used to improve such sampling prior to each policy update. Following this line of thought, we introduce SAUNA, a method where non-informative transitions are rejected from the gradient update. The level of information is estimated according to the fraction o… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(15 citation statements)
references
References 11 publications
1
14
0
Order By: Relevance
“…Exploration, a fundamental component of reinforcement learning, significantly influences sample efficiency and agent performance. In recent years, numerous effective exploration methods have been proposed and extensively researched [7], [8], [12], [13], [15].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Exploration, a fundamental component of reinforcement learning, significantly influences sample efficiency and agent performance. In recent years, numerous effective exploration methods have been proposed and extensively researched [7], [8], [12], [13], [15].…”
Section: Related Workmentioning
confidence: 99%
“…In response to these challenges, this paper proposes a method to adjust the parameters of intrinsic rewards based on feedback from extrinsic rewards for some existing reward schemes, such as AGAC [12], COUNT [7], and RIDE [13]. Our method tackles the exploration-exploitation trade-off by regulating intrinsic rewards, utilizing variations in the extrinsic reward during the learning process as empirical evidence to guide the adjustment of intrinsic rewards.…”
Section: Introductionmentioning
confidence: 99%
“…A landscape that, in addition, will be perfectly tailored to each procedurally generated episode of an environment. Another promising algorithm and one that deviates substantially from the former ones is AGAC (Flet-Berliac et al, 2021), which has been successfully tested in the same benchmarks as RIDE. AGAC trains a policy and a value network, as various other methods do, but critically it also trains an adversary network.…”
Section: Procedural Generationmentioning
confidence: 99%
“…The options framework [4] provides a method to automatically extract temporally extended skills for a long horizon task with the use of options, which are sub-policies that can be leveraged by some other policy in a hierarchical manner. The process of learning such temporal abstractions has been widely studied in the broad domain of hierarchical reinforcement learning [5]. In this paper, we provide an alternate approach for learning options sequentially without a higher-level policy and show a better performance on navigation tasks.…”
Section: Introductionmentioning
confidence: 99%