2022
DOI: 10.48550/arxiv.2203.04050
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

Abstract: Semantic segmentation in bird's eye view (BEV) is an important task for autonomous driving. Though this task has attracted a large amount of research efforts, it is still challenging to flexibly cope with arbitrary (single or multiple) camera sensors equipped on the autonomous vehicle. In this paper, we present BEVSegFormer, an effective transformerbased method for BEV semantic segmentation from arbitrary camera rigs. Specifically, our method first encodes image features from arbitrary cameras with a shared ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(18 citation statements)
references
References 27 publications
0
18
0
Order By: Relevance
“…We can observe that BEVerse-Tiny already obtains the mIoU of 48.7 and outperforms existing methods. Furthermore, BEVerse-Small achieves 51.7 mIoU, which is 7.1 points higher than the previous best method [49]. Motion prediction.…”
Section: Resultsmentioning
confidence: 82%
See 2 more Smart Citations
“…We can observe that BEVerse-Tiny already obtains the mIoU of 48.7 and outperforms existing methods. Furthermore, BEVerse-Small achieves 51.7 mIoU, which is 7.1 points higher than the previous best method [49]. Motion prediction.…”
Section: Resultsmentioning
confidence: 82%
“…It also proposes a learning method to build BEV features from sensory input and predicts vectorized map elements. BEVSegFormer [49] proposes the multi-camera deformable attention to transform image-view features to BEV representations for semantic map construction. Different from these single-task approaches, our BEVerse incorporates the semantic map construction as part of the multitask framework and uses vanilla convolutional layers for segmentation prediction.…”
Section: Semantic Map Constructionmentioning
confidence: 99%
See 1 more Smart Citation
“…Later, View Parsing Network (VPN) [15] uses a fully connected layer to transform the image features into the BEV features and directly supervise the features in the BEV space in an endto-end manner. Similarly, BEVSegFormer [17] uses the deformable attention [25] mechanism to achieve end-to-end mapping. These methods avoid the explicit mapping between image and BEV spaces, but this property also makes them hard to adopt the geometry prior.…”
Section: Related Workmentioning
confidence: 99%
“…The first one is the 100m × 100m setting [11,18,23] with two classes road and lane. The other one is the 60m × 30m setting [10,17,24] with three classes boundary, divider, and ped crossing. In this work, we also propose a new 160m × 100m setting for a more comprehensive evaluation, as shown in Tab.…”
Section: Dataset and Evaluation Settingsmentioning
confidence: 99%