2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018
DOI: 10.1109/iros.2018.8593871
|View full text |Cite
|
Sign up to set email alerts
|

Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning

Abstract: Robots that navigate among pedestrians use collision avoidance algorithms to enable safe and efficient operation. Recent works present deep reinforcement learning as a framework to model the complex interactions and cooperation. However, they are implemented using key assumptions about other agents' behavior that deviate from reality as the number of agents in the environment increases. This work extends our previous approach to develop an algorithm that learns collision avoidance among a variety of types of d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
335
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 518 publications
(336 citation statements)
references
References 22 publications
0
335
0
1
Order By: Relevance
“…This RL framework applies a reward function, R col s jn , u , to penalize the agent in case of collision, and reward in case of reaching its goal. Two different types of RL algorithms are used in this RL framework, value-based [22], [15] and policybased [14] learning. Value-based algorithm assumes that other agents continue their current velocities until next step, ∆t, to be able to extract policy from the value function, V s jn t .…”
Section: A Collision Avoidance With Deep Rl (Ga3c-cadrl)mentioning
confidence: 99%
See 4 more Smart Citations
“…This RL framework applies a reward function, R col s jn , u , to penalize the agent in case of collision, and reward in case of reaching its goal. Two different types of RL algorithms are used in this RL framework, value-based [22], [15] and policybased [14] learning. Value-based algorithm assumes that other agents continue their current velocities until next step, ∆t, to be able to extract policy from the value function, V s jn t .…”
Section: A Collision Avoidance With Deep Rl (Ga3c-cadrl)mentioning
confidence: 99%
“…Many different curriculum training paradigms used in the literature. For example, [14] starts training with SL, then runs two RL phases: one with 2-4 agents in the environment and next with 4-10 agents. [16] uses a two-stage training process, the first stage has 20 agents placed randomly in a simple environment without any obstacle.…”
Section: B Training the Policymentioning
confidence: 99%
See 3 more Smart Citations