In this paper, we propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention. Current anchor-based monocular 3D object detection methods suffer from feature mismatching. To overcome this, we propose a twostep feature alignment approach. In the first step, the shape alignment is performed to enable the receptive field of the feature map to focus on the pre-defined anchors with high confidence scores. In the second step, the center alignment is used to align the features at 2D/3D centers. Further, it is often difficult to learn global information and capture long-range relationships, which are important for the depth prediction of objects. Therefore, we propose a novel asymmetric non-local attention block with multiscale sampling to extract depth-wise features. The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset, in both 3D object detection and bird's eye view tasks. The code is released at https://github. com/mumianyuxin/M3DSSD.
The most recent 3D object detectors for point clouds rely on the coarse voxel-based representation rather than the accurate point-based representation due to a higher box recall in the voxelbased Region Proposal Network (RPN). However, the detection performance is severely restricted by the information loss of pose details in the voxels and the variability in the relationship between the visible part and the full view of objects because of the perspective issue in data acquisition. In this paper, we propose a point-to-voxel feature learning approach to voxelize the point cloud with high-level point-wise semantic features, which enables the voxel-wise and the point-wise feature learning. We propose an attentive corner aggregation module to attentively aggregate the local point cloud surrounding a 3D proposal from the perspectives of eight corners of the proposal 3D bounding box. The experimental results on the competitive KITTI 3D object detection benchmark show that the proposed method achieves state-of-the-art performance.INDEX TERMS 3D object detection, point clouds, attention mechanism, autonomous driving.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.