2020
DOI: 10.48550/arxiv.2002.05229
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Abstract: Deep Reinforcement Learning (RL) is proven powerful for decision making in simulated environments. However, training deep RL model is challenging in real world applications such as production-scale health-care or recommender systems because of the expensiveness of interaction and limitation of budget at deployment. One aspect of the data inefficiency comes from the expensive hyper-parameter tuning when optimizing deep neural networks. We propose Adaptive Behavior Policy Sharing (ABPS), a data-efficient trainin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 9 publications
0
1
0
Order By: Relevance
“…The difficulty comes from the fact that the state-action space is exponentially large [9], where the optimal policy can not be explored efficiently. A mainstream strategy is to feed numerous trials to RL models in the optimization process to enhance performance [10][11][12][13][14][15][16][17]. Nevertheless, such a sample inefficiency challenges the applicability of RL towards large-scale problems [8], where the requested computational overhead is expensive or even unaffordable.…”
Section: Introductionmentioning
confidence: 99%
“…The difficulty comes from the fact that the state-action space is exponentially large [9], where the optimal policy can not be explored efficiently. A mainstream strategy is to feed numerous trials to RL models in the optimization process to enhance performance [10][11][12][13][14][15][16][17]. Nevertheless, such a sample inefficiency challenges the applicability of RL towards large-scale problems [8], where the requested computational overhead is expensive or even unaffordable.…”
Section: Introductionmentioning
confidence: 99%