2019
DOI: 10.48550/arxiv.1904.12901
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Challenges of Real-World Reinforcement Learning

Abstract: Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in realworld systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
136
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 114 publications
(149 citation statements)
references
References 26 publications
0
136
0
Order By: Relevance
“…CMDP. The study of RL algorithms for CMDPs has received considerable attention due to the safety requirement (Altman, 1999;Paternain et al, 2019;Yu et al, 2019;Dulac-Arnold et al, 2019;Garcıa & Fernández, 2015). Our work is closely related to Lagrangian-based CMDP algorithms with optimistic policy evaluations (Efroni et al, 2020;Singh et al, 2020;Ding et al, 2021;Liu et al, 2021;Qiu et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…CMDP. The study of RL algorithms for CMDPs has received considerable attention due to the safety requirement (Altman, 1999;Paternain et al, 2019;Yu et al, 2019;Dulac-Arnold et al, 2019;Garcıa & Fernández, 2015). Our work is closely related to Lagrangian-based CMDP algorithms with optimistic policy evaluations (Efroni et al, 2020;Singh et al, 2020;Ding et al, 2021;Liu et al, 2021;Qiu et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Safe reinforcement learning (RL) studies how an agent learns to maximize its expected total reward by interacting with an unknown environment over time while dealing with restrictions/constraints arising from real-world problems (Amodei et al, 2016;Dulac-Arnold et al, 2019;Garcıa & Fernández, 2015). A standard approach for mod-1 Department of Industrial Engineering and Operations Research, University of California, Berkeley (e-mail: yuhao_ding@berkeley.edu); 2 Department of Industrial Engineering and Operations Research, University of California, Berkeley (e-mail: lavaei@berkeley.edu).…”
Section: Introductionmentioning
confidence: 99%
“…An important limitation of DRL methods is their sample inefficiency: an enormous amount of data is necessary and makes training expensive. This makes applying DRL in the real world challenging, for example in robotics (Sünderhauf et al, 2018;Dulac-Arnold et al, 2019). In tasks like manipulation, sample collection is a slow and costly process (Liu et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…The design of adequate reward functions poses a tremendous challenge for building reinforcement learning (RL) agents that ought to act in accordance with human intentions [4,13]. Besides complicating the deployment of RL in the real world [11], this can lead to major unforeseen societal impacts, which need to be accounted for when building autonomous systems [6,45]. To tackle this, the field of value alignment has largely focused on reward learning, which aims to adopt a bottom-up approach of finding goal specifications from observational data instead of manually specifying them [22,30,40].…”
Section: Introductionmentioning
confidence: 99%