Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning

Fuchs, Florian; Song, Yunlong; Kaufmann, Elia; Scaramuzza, Davide; Dürr, Peter

doi:10.1109/lra.2021.3064284

Cited by 88 publications

(78 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, for more complex tasks that involve multiple sub-goals or equifinal possibilities to achieve task success, the highly tuned, near-optimal dynamics of DRL policies is also their downfall in that these policies can quickly and significantly diverge from those preferred (Carroll et al, 2019 ) or even attainable by human actors (Fuchs et al, 2020 ). This results in DRL agent behavior that is either incompatible or non-reciprocal with respect to human behavior (Carroll et al, 2019 ), or difficult for humans to predict (Shek, 2019 ), even requiring the human user to be more-or-less enslaved to the behavioral dynamics of the DRL agent to achieve task success (Shah and Carroll, 2019 ).…”

Section: Discussionmentioning

confidence: 99%

“…Rapid advances in the field of Deep Reinforcement Learning (DRL; Berner et al, 2019;Vinyals et al, 2019) over the past several years have led to artificial agents (AAs) capable of producing behavior that meets or exceeds human-level performance across a wide variety of tasks. Some notable advancements include DRL agents learning to play solo or multiagent video games [e.g., Atari 2,600 games (Bellemare et al, 2013), DOTA (Berner et al, 2019), Gran Turismo Sport (Fuchs et al, 2020), Starcraft II (Vinyals et al, 2019)], and even combining DRL with natural language processing (NLP) to win at text-based games such as Zork (Ammanabrolu et al, 2020). There have also been major developments in the application of DRL agents for physical systems, including applying DRL to automate a complex manufacturing-like process for the control of a foosball game table (De Blasi et al, 2021), using DRL to control a robotic agent during a collaborative human-machine maze-game (Shafti et al, 2020), and for the control of a robotic hand performing valve rotation and finger gaiting (Morgan et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Navigational Behavior of Humans and Deep Reinforcement Learning Agents

et al. 2021

View full text Add to dashboard Cite

Rapid advances in the field of Deep Reinforcement Learning (DRL) over the past several years have led to artificial agents (AAs) capable of producing behavior that meets or exceeds human-level performance in a wide variety of tasks. However, research on DRL frequently lacks adequate discussion of the low-level dynamics of the behavior itself and instead focuses on meta-level or global-level performance metrics. In doing so, the current literature lacks perspective on the qualitative nature of AA behavior, leaving questions regarding the spatiotemporal patterning of their behavior largely unanswered. The current study explored the degree to which the navigation and route selection trajectories of DRL agents (i.e., AAs trained using DRL) through simple obstacle ridden virtual environments were equivalent (and/or different) from those produced by human agents. The second and related aim was to determine whether a task-dynamical model of human route navigation could not only be used to capture both human and DRL navigational behavior, but also to help identify whether any observed differences in the navigational trajectories of humans and DRL agents were a function of differences in the dynamical environmental couplings.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Navigational Behavior of Humans and Deep Reinforcement Learning Agents

et al. 2021

View full text Add to dashboard Cite

show abstract

“…To compete with the best human drivers, GT Sophy had to be sufficiently fast in a time trial setting. The agent was given a progress reward 23 for the speed with which it advanced around the track, and an off-course penalty if it went out of bounds. This incremental reward function allowed the agent to quickly receive positive rewards for staying on the track and driving fast.…”

Section: Race Car Controlmentioning

confidence: 99%

“…Course progress -Following previous work 23 , the primary reward component rewarded the amount of progress made along the track since the last observation. To measure progress, we made use of the state variable l that measured the length (in particularly easy to cut, the penalty was proportional to the speed (not squared) and the penalty was doubled for the difficult first and final chicane.…”

Section: Rewardsmentioning

confidence: 99%

The authors have requested that this preprint be removed from Research Square

Wurman¹,

Barrett²,

Kawamoto³

et al. 2021

Preprint

View full text Add to dashboard Cite

Many potential applications of artificial intelligence involve making real-time decisions in physical systems. Automobile racing represents an extreme case of real-time decision making in close proximity to other highly-skilled drivers while near the limits of vehicular control. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the nonlinear control challenges of real race cars while also encapsulating the complex multi-agent interactions. We attack, and solve for the first time, the simulated racing challenge using model-free deep reinforcement learning. We introduce a novel reinforcement learning algorithm and enhance the learning process with mixed scenario training to encourage the agent to incorporate racing tactics into an integrated control policy. In addition, we construct a reward function that enables the agent to adhere to the sport's under-specified racing etiquette rules. We demonstrate the capabilities of our agent, GT Sophy, by winning two of three races against four of the world's best Gran Turismo drivers and being competitive in the overall team score. By showing that these techniques can be successfully used to train championship-level race car drivers, we open up the possibility of their use in other complex dynamical systems and real-world applications.

show abstract

“…They adopted the deep deterministic policy gradient (DDPG) algorithm to manage complex road curvatures, states, and action spaces in a continuous domain and tested the approach in an open-source 3D car racing simulator called "TORCS" [19]. The Robotics and Perception Group at the University of Zurich created an autonomous agent for a GT Sport car racing simulator [20] that matched or outperformed human experts in time trials; this worked by defining a reward function for formulating the racing problem and a neural network policy for mapping input states to actions, then, the policy parameters were optimized by maximizing the reward function using the soft actor-critic algorithm [21]. Reference [22] introduced a robust drift controller based on an RL framework with a soft actor-critic algorithm and used a "CARLA" simulator [23] for training and validation.…”

Section: Introductionmentioning

confidence: 99%

Self-Optimizing Path Tracking Controller for Intelligent Vehicles Based on Reinforcement Learning

Xie

Song

et al. 2021

Symmetry

View full text Add to dashboard Cite

The path tracking control system is a crucial component for autonomous vehicles; it is challenging to realize accurate tracking control when approaching a wide range of uncertain situations and dynamic environments, particularly when such control must perform as well as, or better than, human drivers. While many methods provide state-of-the-art tracking performance, they tend to emphasize constant PID control parameters, calibrated by human experience, to improve tracking accuracy. A detailed analysis shows that PID controllers inefficiently reduce the lateral error under various conditions, such as complex trajectories and variable speed. In addition, intelligent driving vehicles are highly non-linear objects, and high-fidelity models are unavailable in most autonomous systems. As for the model-based controller (MPC or LQR), the complex modeling process may increase the computational burden. With that in mind, a self-optimizing, path tracking controller structure, based on reinforcement learning, is proposed. For the lateral control of the vehicle, a steering method based on the fusion of the reinforcement learning and traditional PID controllers is designed to adapt to various tracking scenarios. According to the pre-defined path geometry and the real-time status of the vehicle, the interactive learning mechanism, based on an RL framework (actor–critic—a symmetric network structure), can realize the online optimization of PID control parameters in order to better deal with the tracking error under complex trajectories and dynamic changes of vehicle model parameters. The adaptive performance of velocity changes was also considered in the tracking process. The proposed controlling approach was tested in different path tracking scenarios, both the driving simulator platforms and on-site vehicle experiments have verified the effects of our proposed self-optimizing controller. The results show that the approach can adaptively change the weights of PID to maintain a tracking error (simulation: within ±0.071 m; realistic vehicle: within ±0.272 m) and steering wheel vibration standard deviations (simulation: within ±0.04°; realistic vehicle: within ±80.69°); additionally, it can adapt to high-speed simulation scenarios (the maximum speed is above 100 km/h and the average speed through curves is 63–76 km/h).

show abstract

Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning

Cited by 88 publications

References 24 publications

Navigational Behavior of Humans and Deep Reinforcement Learning Agents

Navigational Behavior of Humans and Deep Reinforcement Learning Agents

The authors have requested that this preprint be removed from Research Square

Self-Optimizing Path Tracking Controller for Intelligent Vehicles Based on Reinforcement Learning

Contact Info

Product

Resources

About