2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) 2019
DOI: 10.1109/iccvw.2019.00109
|View full text |Cite
|
Sign up to set email alerts
|

Short-Term Prediction and Multi-Camera Fusion on Semantic Grids

Abstract: An environment representation (ER) is a substantial part of every autonomous system. It introduces a common interface between perception and other system components, such as decision making, and allows downstream algorithms to deal with abstracted data without knowledge of the used sensor. In this work, we propose and evaluate a novel architecture that generates an egocentric, grid-based, predictive, and semantically-interpretable ER. In particular, we provide a proof of concept for the spatio-temporal fusion … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 32 publications
0
8
0
Order By: Relevance
“…Recent methods [50,62,69,25] estimate future frames by reasoning about shape, egomotion, and foreground motion separately. However, none of these methods reason explicitly about individual instances, while our method yields a full future panoptic segmentation forecast.…”
Section: Methods That Anticipatementioning
confidence: 99%
“…Recent methods [50,62,69,25] estimate future frames by reasoning about shape, egomotion, and foreground motion separately. However, none of these methods reason explicitly about individual instances, while our method yields a full future panoptic segmentation forecast.…”
Section: Methods That Anticipatementioning
confidence: 99%
“…As a result, the pixel-level semantics from the semantic segmentation can represent the uncountable or difficult to count street elements. In this study, we adopted DeepLab (version 3) pre-trained on the Cityscapes Dataset [53] because it is one of the best models evaluated on the open benchmarking dataset, Cityscapes [59,60].…”
Section: Semantic Feature Detectionmentioning
confidence: 99%
“…Erkent et al [4] predict a semantic grid from camera images but perform an early fusion of occupancy grid computed using lidar data and evaluate in a single camera setting. Hoyer et al [6] predict a top-down semantic grid representation and use images from cameras situated at different angles. However, their pipeline first runs semantic segmentation on the images and then uses stereo-depth in order to map each pixels semantic label to a top-down grid, whereas we operate directly on raw camera images which reduces pipeline complexity.…”
Section: Related Workmentioning
confidence: 99%