“…Prior works in the field of sensor fusion have mostly focused on the perception aspect of driving, e.g. 2D and 3D object detection [22,12,66,9,44,31,34,61,33,37], motion forecasting [22,36,5,35,63,6,19,38,32,9], and depth estimation [24,60,61,33]. These methods focus on learning a state representation that captures the geometric and semantic information of the 3D scene.…”