2022
DOI: 10.48550/arxiv.2203.04749
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bilateral Deep Reinforcement Learning Approach for Better-than-human Car Following Model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…Typically, the reward is a linear combination of several terms including safety, efficiency, comfort, speed-following, energy consumption, and so forth, with one of the terms in the reward function being a safety reward. The safety term is often either a large penalty (negative reward) for a crash (or a very small gap) in training (28), or a large penalty whenever the follower has a low TTC with respect to the leader (3,14). In either case, for agents trained using reward alone, the satisfaction of safety constraints is not guaranteed.…”
Section: Reinforcement Learning and Markov Decision Processesmentioning
confidence: 99%
See 3 more Smart Citations
“…Typically, the reward is a linear combination of several terms including safety, efficiency, comfort, speed-following, energy consumption, and so forth, with one of the terms in the reward function being a safety reward. The safety term is often either a large penalty (negative reward) for a crash (or a very small gap) in training (28), or a large penalty whenever the follower has a low TTC with respect to the leader (3,14). In either case, for agents trained using reward alone, the satisfaction of safety constraints is not guaranteed.…”
Section: Reinforcement Learning and Markov Decision Processesmentioning
confidence: 99%
“…Importance of Using a Target Speed Instead of a Target Gap. It is common (for example Zhu et al [3], Shi et al [14], Lin et al [28]) to formulate the efficiency part of the RL carfollowing reward as following a set target gap. In our work, we instead formulate efficiency as following the dynamic maximal safe next speed.…”
Section: Rewardsmentioning
confidence: 99%
See 2 more Smart Citations