2021
DOI: 10.1109/lra.2021.3064284
|View full text |Cite
|
Sign up to set email alerts
|

Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning

Abstract: Autonomous car racing is a major challenge in robotics. It raises fundamental problems for classical approaches such as planning minimum-time trajectories under uncertain dynamics and controlling the car at the limits of its handling. Besides, the requirement of minimizing the lap time, which is a sparse objective, and the difficulty of collecting training data from human experts have also hindered researchers from directly applying learning-based approaches to solve the problem. In the present work, we propos… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
78
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 88 publications
(78 citation statements)
references
References 24 publications
0
78
0
Order By: Relevance
“…On the other hand, for more complex tasks that involve multiple sub-goals or equifinal possibilities to achieve task success, the highly tuned, near-optimal dynamics of DRL policies is also their downfall in that these policies can quickly and significantly diverge from those preferred (Carroll et al, 2019 ) or even attainable by human actors (Fuchs et al, 2020 ). This results in DRL agent behavior that is either incompatible or non-reciprocal with respect to human behavior (Carroll et al, 2019 ), or difficult for humans to predict (Shek, 2019 ), even requiring the human user to be more-or-less enslaved to the behavioral dynamics of the DRL agent to achieve task success (Shah and Carroll, 2019 ).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, for more complex tasks that involve multiple sub-goals or equifinal possibilities to achieve task success, the highly tuned, near-optimal dynamics of DRL policies is also their downfall in that these policies can quickly and significantly diverge from those preferred (Carroll et al, 2019 ) or even attainable by human actors (Fuchs et al, 2020 ). This results in DRL agent behavior that is either incompatible or non-reciprocal with respect to human behavior (Carroll et al, 2019 ), or difficult for humans to predict (Shek, 2019 ), even requiring the human user to be more-or-less enslaved to the behavioral dynamics of the DRL agent to achieve task success (Shah and Carroll, 2019 ).…”
Section: Discussionmentioning
confidence: 99%
“…Rapid advances in the field of Deep Reinforcement Learning (DRL; Berner et al, 2019;Vinyals et al, 2019) over the past several years have led to artificial agents (AAs) capable of producing behavior that meets or exceeds human-level performance across a wide variety of tasks. Some notable advancements include DRL agents learning to play solo or multiagent video games [e.g., Atari 2,600 games (Bellemare et al, 2013), DOTA (Berner et al, 2019), Gran Turismo Sport (Fuchs et al, 2020), Starcraft II (Vinyals et al, 2019)], and even combining DRL with natural language processing (NLP) to win at text-based games such as Zork (Ammanabrolu et al, 2020). There have also been major developments in the application of DRL agents for physical systems, including applying DRL to automate a complex manufacturing-like process for the control of a foosball game table (De Blasi et al, 2021), using DRL to control a robotic agent during a collaborative human-machine maze-game (Shafti et al, 2020), and for the control of a robotic hand performing valve rotation and finger gaiting (Morgan et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…To compete with the best human drivers, GT Sophy had to be sufficiently fast in a time trial setting. The agent was given a progress reward 23 for the speed with which it advanced around the track, and an off-course penalty if it went out of bounds. This incremental reward function allowed the agent to quickly receive positive rewards for staying on the track and driving fast.…”
Section: Race Car Controlmentioning
confidence: 99%
“…Course progress -Following previous work 23 , the primary reward component rewarded the amount of progress made along the track since the last observation. To measure progress, we made use of the state variable l that measured the length (in particularly easy to cut, the penalty was proportional to the speed (not squared) and the penalty was doubled for the difficult first and final chicane.…”
Section: Rewardsmentioning
confidence: 99%
“…They adopted the deep deterministic policy gradient (DDPG) algorithm to manage complex road curvatures, states, and action spaces in a continuous domain and tested the approach in an open-source 3D car racing simulator called "TORCS" [19]. The Robotics and Perception Group at the University of Zurich created an autonomous agent for a GT Sport car racing simulator [20] that matched or outperformed human experts in time trials; this worked by defining a reward function for formulating the racing problem and a neural network policy for mapping input states to actions, then, the policy parameters were optimized by maximizing the reward function using the soft actor-critic algorithm [21]. Reference [22] introduced a robust drift controller based on an RL framework with a soft actor-critic algorithm and used a "CARLA" simulator [23] for training and validation.…”
Section: Introductionmentioning
confidence: 99%