Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks

Roddick, Thomas; Cipolla, Roberto

doi:10.1109/cvpr42600.2020.01115

Cited by 202 publications

(163 citation statements)

References 13 publications

Supporting

Mentioning

161

Contrasting

Order By: Relevance

“…This can be achieved by semantic segmentation in BEV for the targets like drivable areas, car parking, lane dividers, stopping lines, and so on. The methods with leading performance in benchmark [1] are always with the same perception pipeline [34,30,28,47]. In this pipeline, there are four main components: an image-view encoder for encoding features in image view, a view transformer for transforming the features from image view to BEV, a BEV encoder for further encoding the feature in BEV, and a simple head for pixel-wise classification.…”

Section: Semantic Segmentation In Bevmentioning

confidence: 99%

“…The substitutions include PON [34], VPN [28], PYVA [47], and so on. The adopted view transformer takes the imageview feature as input and densely predicts the depth through a classification manner.…”

Section: View Transformermentioning

confidence: 99%

“…nuScenes dataset includes 1000 scenes with images from 6 cameras with surrounding views, points from 5 Radars and 1 LiDAR. It is the up-todate popular benchmark for 3D object detection [42,44,43,29] and BEV semantic segmentation [34,30,28,47]. The scenes are officially split into 700/150/150 scenes for training/validation/testing.…”

Section: Experimental Settingsmentioning

confidence: 99%

“…However, with respect to the scene of autonomous driving where both performance and time-efficiency are required, the main tasks like 3D object detection and Bird-Eye-View (BEV) semantic segmentation are still conducted by different paradigms in the up-to-date benchmarks. For example, in the nuScenes [1] benchmark, the multi-camera 3D object detection track is dominated by image-view-based methods like FCOS3D [42] and PGD [43], while the BEV semantic segmentation track is dominated by the BEV-based methods like PON [34], Lift-Splat-Shoot [30], and VPN [28]. Which view space is more reasonable for perception in autonomous driving, and can we handle these tasks in a unified view space?…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Huang¹,

Guan²,

Zhu³

et al. 2021

Preprint

168

View full text Add to dashboard Cite

Autonomous driving perceives the surrounding environment for decision making, which is one of the most complicated scenes for visual perception. The great power of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, feasible, and scalable paradigm for pushing the performance boundary in this area. To this end, we contribute the BEVDet paradigm in this paper. BEVDet is developed by following the principle of detecting the 3D objects in Bird-Eye-View (BEV), where route planning can be handily performed. In this paradigm, four kinds of modules are conducted in succession with different roles: an image-view encoder for encoding feature in image view, a view transformer for feature transformation from image view to BEV, a BEV encoder for further encoding feature in BEV, and a task-specific head for predicting the targets in BEV. We merely reuse the existing modules for constructing BEVDet and make it feasible for multi-camera 3D object detection by constructing an exclusive data augmentation strategy. The proposed paradigm works well in multi-camera 3D object detection and offers a good trade-off between computing budget and performance. BEVDet with 704×256 (1/8 of the competitors) image size scores 29.4% mAP and 38.4% NDS on the nuScenes val set, which is comparable with FCOS3D (i.e., 2008.2 GFLOPs, 1.7 FPS, 29.5% mAP, and 37.2% NDS), while requires just 12% computing budget of 239.4 GFLOPs and runs 4.3 times faster. Scaling up the input size to 1408×512, BEVDet scores 34.9% mAP and 41.7% NDS, which requires just 601.4 GFLOPs and significantly suppresses FCOS3D by 5.4% mAP and 4.5% NDS. The superiority of BEVDet tells the magic of paradigm innovation.

show abstract

Section: Semantic Segmentation In Bevmentioning

confidence: 99%

Section: View Transformermentioning

confidence: 99%

Section: Experimental Settingsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Huang¹,

Guan²,

Zhu³

et al. 2021

Preprint

168

View full text Add to dashboard Cite

show abstract

“…Mapping, navigation, and path planning have been one of the major research focus areas in robotics and auto industries in the past two decades [ 1 , 2 , 3 , 4 , 5 , 6 , 7 ]. Major contributions have been introduced, particularly to increasing the perception and understanding of the robots’ surrounding environment.…”

Section: Introductionmentioning

confidence: 99%

LiDAR-Based Glass Detection for Improved Occupancy Grid Mapping

Tibebu

Roche

Silva

et al. 2021

Sensors

View full text Add to dashboard Cite

Creating an accurate awareness of the environment using laser scanners is a major challenge in robotics and auto industries. LiDAR (light detection and ranging) is a powerful laser scanner that provides a detailed map of the environment. However, efficient and accurate mapping of the environment is yet to be obtained, as most modern environments contain glass, which is invisible to LiDAR. In this paper, a method to effectively detect and localise glass using LiDAR sensors is proposed. This new approach is based on the variation of range measurements between neighbouring point clouds, using a two-step filter. The first filter examines the change in the standard deviation of neighbouring clouds. The second filter uses a change in distance and intensity between neighbouring pules to refine the results from the first filter and estimate the glass profile width before updating the cartesian coordinate and range measurement by the instrument. Test results demonstrate the detection and localisation of glass and the elimination of errors caused by glass in occupancy grid maps. This novel method detects frameless glass from a long range and does not depend on intensity peak with an accuracy of 96.2%.

show abstract