2019
DOI: 10.48550/arxiv.1908.09492
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection

Abstract: This report presents our method which wins the nuScenes 3D Detection Challenge [17] held in Workshop on Autonomous Driving(WAD, CVPR 2019). Generally, we utilize sparse 3D convolution to extract rich semantic features, which are then fed into a class-balanced multi-head network to perform 3D object detection. To handle the severe class imbalance problem inherent in the autonomous driving scenarios, we design a class-balanced sampling and augmentation strategy to generate a more balanced data distribution. Furt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
166
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 101 publications
(167 citation statements)
references
References 23 publications
1
166
0
Order By: Relevance
“…Training Parameters Models are trained with AdamW [25] optimizer, in which gradient clip is exploited with learning rate 2e-4, a total batch size of 64 on 8 NVIDIA [48], all models are trained with CBGS [54]. In testing time, the input image is scaled by factor of 0.48 and cropped to 704×256 resolution with a region of (x 1 , x 2 , y 1 , y 2 ) = (32, 736, 176, 432).…”
Section: Experimental Settingsmentioning
confidence: 99%
“…Training Parameters Models are trained with AdamW [25] optimizer, in which gradient clip is exploited with learning rate 2e-4, a total batch size of 64 on 8 NVIDIA [48], all models are trained with CBGS [54]. In testing time, the input image is scaled by factor of 0.48 and cropped to 704×256 resolution with a region of (x 1 , x 2 , y 1 , y 2 ) = (32, 736, 176, 432).…”
Section: Experimental Settingsmentioning
confidence: 99%
“…We evaluated Fixy on two AV perception datasets: an internal dataset from our research organization and the publicly available Lyft Level 5 perception dataset [13]. The Lyft dataset has been used to develop models [33] and host competitions [27]. Both datasets consists of many scenes of LIDAR and camera data that were densely labeled with 3D bounding boxes by leading external vendors for human labels ("human-proposed labels").…”
Section: Methodsmentioning
confidence: 99%
“…Observation sources. We used three sources of observations over the data: human-proposed labels, LIDAR ML model predictions [16,33], and expert auditor labels. All sources predict 3D bounding boxes.…”
Section: Methodsmentioning
confidence: 99%
“…For 3D detection, we use the same VoxelNet [75] and PointPillars [23] architectures following [23,66,76]. For VoxelNet, the detection range is [−54m, 54m] for the X, Y axis and [−5m, 3m] for the Z axis while the range is [−51.2m, 51.2m] for the X, Y axis for the PointPillar architecture.…”
Section: Methodsmentioning
confidence: 99%