2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01528
|View full text |Cite
|
Sign up to set email alerts
|

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
82
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 84 publications
(84 citation statements)
references
References 40 publications
2
82
0
Order By: Relevance
“…Methods [30,16,9] utilize multilayer perceptron to learn the translation from perspective view to the BEV. PYVA [50] proposes a cross-view transformer that converts the front-view monocular image into the BEV, but this paradigm is not suitable for fusing multi-camera features due to the computational cost of global attention mechinism [42]. In addition to the spatial information, previous works [18,38,6] also consider the temporal information by stacking BEV features from several timestamps.…”
Section: Camera-based 3d Perceptionmentioning
confidence: 99%
“…Methods [30,16,9] utilize multilayer perceptron to learn the translation from perspective view to the BEV. PYVA [50] proposes a cross-view transformer that converts the front-view monocular image into the BEV, but this paradigm is not suitable for fusing multi-camera features due to the computational cost of global attention mechinism [42]. In addition to the spatial information, previous works [18,38,6] also consider the temporal information by stacking BEV features from several timestamps.…”
Section: Camera-based 3d Perceptionmentioning
confidence: 99%
“…For example, NEAT [26] uses transformer encoder and converts image features to BEV space with a MLP-based attention by traversing through all the grid in the BEV space. PYVA [27] generates BEV features from a CNN encoder-decoder structure by MLP. The feature is then enhanced by a transformer cross-attention module.…”
Section: Related Workmentioning
confidence: 99%
“…In the cross-view domain, some novel and effective transformer structures have also been proposed to implement different downstream tasks.Chen et al [55] proposed a pair of cross-view transformers to transform the feature maps into the other view and introduce cross-view consistency loss on them. Yang et al [56] presented a novel framework that enables reconstructing a local map formed by road layout and vehicle occupancy in the bird's-eye view given a front-view monocular image only, and a cross-view transformation module was proposed to strengthen the view transformation and scene understanding. Tulder et al [57] presented a novel cross-view transformer method to transfer information between unregistered views at the level of spatial feature maps, which achieved remarkable results in field of Multi-view medical image analysis.…”
Section: B Transformer In Visionmentioning
confidence: 99%