In this work, we present a data-driven simulation and training engine capable of learning end-to-end autonomous vehicle control policies using only sparse rewards. By leveraging real, human-collected trajectories through an environment, we render novel training data that allows virtual agents to drive along a continuum of new local trajectories consistent with the road appearance and semantics, each with a different view of the scene. We demonstrate the ability of policies learned within our simulator to generalize to and navigate in previously unseen real-world roads, without access to any human control labels during training. Our results validate the learned policy onboard a full-scale autonomous vehicle, including in previously un-encountered scenarios, such as new roads and novel, complex, near-crash situations. Our methods are scalable, leverage reinforcement learning, and apply broadly to situations requiring effective perception and robust operation in the physical world.Index Terms-Deep learning in robotics and automation, autonomous agents, real world reinforcement learning, data-driven simulation.
I. INTRODUCTIONE ND-TO-END (i.e., perception-to-control) trained neural networks for autonomous vehicles have shown great promise for lane stable driving [1]-[3]. However, they lack methods to learn robust models at scale and require vast amounts of training data that are time consuming and expensive to collect. Learned end-to-end driving policies and modular perception components in a driving pipeline require capturing training data from all necessary edge cases, such as recovery from off-orientation positions or even near collisions. This is not only prohibitively expensive, but also potentially dangerous [4]. Training and evaluating robotic controllers in simulation [5]-[7]