2020
DOI: 10.48550/arxiv.2012.15712
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

Abstract: Recent advances on 3D object detection heavily rely on how the 3D data are represented, i.e., voxel-based or point-based representation. Many existing high performance 3D detectors are point-based because this structure can better retain precise point positions. Nevertheless, point-level features lead to high computation overheads due to unordered storage. In contrast, the voxel-based structure is better suited for feature extraction but often yields lower accuracy because the input data are divided into grids… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
72
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(72 citation statements)
references
References 20 publications
0
72
0
Order By: Relevance
“…cos(δ), are zero, which could stop backpropagation learning at the extreme opposite output. To resolve this problem, researchers [50,51,52,53,54,55,56,57,58,59,60,61] performed regression on both the cosine and sine of the angle (cos(θ), sin(θ)). When converting any given 1D Scalar-Based Orientation θ into 2-Dimensional values, they can be mapped to 2-D Cartesian system with angle's cosine value projected to x-axis and sine value projected to y-axis.…”
Section: Alpha (Local/allocentric Rotation)mentioning
confidence: 99%
“…cos(δ), are zero, which could stop backpropagation learning at the extreme opposite output. To resolve this problem, researchers [50,51,52,53,54,55,56,57,58,59,60,61] performed regression on both the cosine and sine of the angle (cos(θ), sin(θ)). When converting any given 1D Scalar-Based Orientation θ into 2-Dimensional values, they can be mapped to 2-D Cartesian system with angle's cosine value projected to x-axis and sine value projected to y-axis.…”
Section: Alpha (Local/allocentric Rotation)mentioning
confidence: 99%
“…These representations are then fed into 2D CNN architectures to regress bounding box locations and categories. Another popular lidar-based approach explores various representations of points using 3D convolutional networks such as voxels [11,61,65,74], spheres [32,46], pillars [25], or learning point features directly [35,47] using point operators [39,40]. These features are transformed to a BEV feature map where 3D boxes are generated using sparse convolution or transformer architectures [33,35].…”
Section: Introductionmentioning
confidence: 99%
“…This is similar to the processing method that has been widely studied for 2D images, but the quantization loss of position information will inevitably occur in the voxelization process. These are called voxel-based approaches [5,6,14,40,[45][46][47]. On the other hand, with the PointNet and PointNet++ proposed by Qi et al [25,26], some method directly learn features from raw point-cloud and predict 3D bounding boxes by foreground points and their features.…”
Section: Introductionmentioning
confidence: 99%
“…PV-RCNN [28] is a representative work among them, and its performance has reached the state of the arts, but due to the introduction of complex point operations, it cannot meet the real-time requirements (10 fps). However, Voxel-RCNN [6] proposed that the traditional voxel-based methods which extract features from bird's eye view (BEV) ignore the 3D information, which reduces the performance of the detector. Therefore, the voxel RoI pooling operation is proposed, and the performance of the cat is similar or even better than that of PV-RCNN on KITTI dataset [8].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation