“…Recent methods follow this concept to output planning results for the ego car given sensor inputs [17,18,25,53,63]. Most of them follow a conventional pipeline of perception [21,32,33,57,65], prediction [11,14,36,66], and planning [22,23,54,67]. They usually first perform BEV perception to extract relevant information (e.g., 3D agent boxes, semantic maps, tracklets) and then exploit them to infer future trajectories of agents and the ego vehicle.…”