2022 IEEE 61st Conference on Decision and Control (CDC) 2022
DOI: 10.1109/cdc51059.2022.9992450
|View full text |Cite
|
Sign up to set email alerts
|

Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures

Abstract: Traditional reinforcement learning (RL) aims to maximize the expected total reward, while the risk of uncertain outcomes needs to be controlled to ensure reliable performance in a risk-averse setting. In this paper, we consider the problem of maximizing dynamic risk of a sequence of rewards in infinite-horizon Markov Decision Processes (MDPs). We adapt the Expected Conditional Risk Measures (ECRMs) to the infinite-horizon risk-averse MDP and prove its time consistency. Using a convex combination of expectation… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 31 publications
0
1
0
Order By: Relevance
“…Different researchers use different approaches to arrive at a general solution. In particular, in [2] a model for checking the class hierarchy based on classifications is presented, in [3] the solution of the problem is obtained with an unlimited parameterized value of softmax. The research [4] proposes a new method of adversarial learning based on the detection and removal of large values of «weight» coefficients, rather than their algorithmic reduction.…”
Section: Introductionmentioning
confidence: 99%
“…Different researchers use different approaches to arrive at a general solution. In particular, in [2] a model for checking the class hierarchy based on classifications is presented, in [3] the solution of the problem is obtained with an unlimited parameterized value of softmax. The research [4] proposes a new method of adversarial learning based on the detection and removal of large values of «weight» coefficients, rather than their algorithmic reduction.…”
Section: Introductionmentioning
confidence: 99%