2021
DOI: 10.48550/arxiv.2107.00842
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cross-view Geo-localization with Evolving Transformer

Abstract: In this work, we address the problem of cross-view geo-localization, which estimates the geospatial location of a street view image by matching it with a database of geo-tagged aerial images. The cross-view matching task is extremely challenging due to drastic appearance and geometry differences across views. Unlike existing methods that predominantly fall back on CNN, here we devise a novel evolving geo-localization Transformer (EgoTR) that utilizes the properties of self-attention in Transformer to model glo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…In 2020, Dosovitskiy et al [9] creatively applied Transformer to the image field, divided continuous images into discrete Tokens through Patch Embedding, and converted the input into a sequence. In 2021, EgoTR [10] first introduced the Transformer model into the cross view localization task, enabling global context learning and location aware representation. At the same time, it also proposed a novel self-attention to promote cross layer information flow and achieve the transcendence of the CNN model.…”
Section: Transformer-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In 2020, Dosovitskiy et al [9] creatively applied Transformer to the image field, divided continuous images into discrete Tokens through Patch Embedding, and converted the input into a sequence. In 2021, EgoTR [10] first introduced the Transformer model into the cross view localization task, enabling global context learning and location aware representation. At the same time, it also proposed a novel self-attention to promote cross layer information flow and achieve the transcendence of the CNN model.…”
Section: Transformer-based Methodsmentioning
confidence: 99%
“…The inference speed is 1.48 times that of DSM and 2.52 times that of EgoTR, reaching the real-time speed of 33.78 FPS. Param(MB) FPS(Frame per second) SAFA [6] 29.5 35.90 DSM [8] 17.9 22.72 EgoTR [10] 195.9 3, the offset module has N branches with identical structures but not shared parameters. Different branches are responsible for learning the features of different offsets.…”
Section: Comparison Of Model Size and Inference Speedmentioning
confidence: 99%
“…Tulder et al [57] presented a novel cross-view transformer method to transfer information between unregistered views at the level of spatial feature maps, which achieved remarkable results in field of Multi-view medical image analysis. Yang et al [58] proposed a simple yet effective self-cross attention mechanism to improve the quality of learned representations. Which improved the generalization ability and encourages representations to keep evolving as the network goes deeper.…”
Section: B Transformer In Visionmentioning
confidence: 99%