“…Driving Policy Learning: While policy learning for driving using real-world data is largely restricted to IL [31]- [35], learning in simulation allows for greater algorithmic flexibiltiy ranging from IL [5], [36], [37], to RL [1], [4], [38], [39], and GPL [6], [40]. Evaluation of trained policies in closed-loop simulation [5], [31], [36], [40]- [42] also presents benefits over open-loop evaluation [7], [32], [43]. Similarly, our work leverages simulation for edge-case training data generation, and closed-loop evaluation before deployment.…”