2023
DOI: 10.3390/a16070325
|View full text |Cite
|
Sign up to set email alerts
|

Risk-Sensitive Policy with Distributional Reinforcement Learning

Abstract: Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…In risk-sensitive RL, the focus is on minimizing a measure of the risk induced by the cumulative rewards, rather the maximizing the expected return. Risk-sensitive RL has gained attention in recent years [40][41][42][43]. Various risk measures may be considered, such that Exponential Utility [44] Cumulative Prospect Theory [45] or Conditional Value-at-Risk (CVaR) [46].…”
Section: Related Workmentioning
confidence: 99%
“…In risk-sensitive RL, the focus is on minimizing a measure of the risk induced by the cumulative rewards, rather the maximizing the expected return. Risk-sensitive RL has gained attention in recent years [40][41][42][43]. Various risk measures may be considered, such that Exponential Utility [44] Cumulative Prospect Theory [45] or Conditional Value-at-Risk (CVaR) [46].…”
Section: Related Workmentioning
confidence: 99%
“…Such innovations have been shown to improve performance of deep RL agents on benchmark tasks due to improved statistical robustness, and evidence of distributional RL-like computations has been reported in midbrain DANs of both mice and primates 7,8,9 , however the direct functional relevance of such distributional mechanisms and representations to behavior is unknown. In the engineering setting, deep RL agents vary in whether and how they make use of knowledge about the distribution over future rewards when selecting actions [10][11][12] , and decoding of reward distributions from DAN activity has only been demonstrated at the time of reward delivery 7 . In principle, knowing in advance, at the start of an episode, about the range and likelihood of rewards available and when they are likely to occur could be highly useful for planning and flexible behavior, particularly in the face of non-stationary in either the environment or internal state (e.g.…”
Section: Introductionmentioning
confidence: 99%