2017
DOI: 10.1007/978-3-319-71589-6_36
|View full text |Cite
|
Sign up to set email alerts
|

Survival-Oriented Reinforcement Learning Model: An Effcient and Robust Deep Reinforcement Learning Algorithm for Autonomous Driving Problem

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 6 publications
0
5
0
Order By: Relevance
“…In multifidelity reinforcement learning; the policies for low fidelity simulators were transferred to high fidelity simulators for exploration as heuristics to find optimal policies with less data [28]. A constrained Markov Decision Process (MDP) called survival oriented RL which takes survival (Negative avoidance) as the first priority rather than maximizing reward is considered in [29] for ensuring safety. Multi-Agent RL has been used for high level strategic decision making such as overtaking, following vehicles using dynamic coordination graphs [30].…”
Section: Modern Approachesmentioning
confidence: 99%
“…In multifidelity reinforcement learning; the policies for low fidelity simulators were transferred to high fidelity simulators for exploration as heuristics to find optimal policies with less data [28]. A constrained Markov Decision Process (MDP) called survival oriented RL which takes survival (Negative avoidance) as the first priority rather than maximizing reward is considered in [29] for ensuring safety. Multi-Agent RL has been used for high level strategic decision making such as overtaking, following vehicles using dynamic coordination graphs [30].…”
Section: Modern Approachesmentioning
confidence: 99%
“…Outside of medicine, (Ye et al, 2017) proposed a method, which aims to optimize the cumulative rewards in a constrained MDP, with a negative avoidance constraint. Their method uses a Negative Avoidance Function (NAF), which plays a role similar to a hazard function.…”
Section: Related Workmentioning
confidence: 99%
“…The external models are not limited in RL policies. For example, danger index can be predicted with Negative-Avoidance Function [35] and intrinsic fear is approximated by a supervised learning model [5], which all can intervene an original action and improve the safety. Researchers [36], [37], [38] also have investigated reset policies, which are trained jointly with the task policy to gradually expand recoverable regions.…”
Section: B Conservative Explorationmentioning
confidence: 99%