“…In recent years, there are some studies on 3D object detection based on camera and LiDAR fusion, which can be classified into serial type [ 10 , 11 , 12 , 13 , 14 , 15 , 16 ] and parallel type [ 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 ] according to the stage of fusion. The serial type method is represented by F-PointNet [ 13 ] which usually takes the image of the camera as input first and uses image object detection or semantic segmentation algorithm to get the spatial location of the object, then projects it to the LiDAR point cloud to extract the point cloud of the frustum region around the object, and finally uses the normal point cloud 3D object detection algorithm to get 3D bounding boxes.…”