2022
DOI: 10.48550/arxiv.2204.09967
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization

Abstract: Ground-to-aerial geolocalization refers to localizing a ground-level query image by matching it to a reference database of geo-tagged aerial imagery. This is very challenging due to the huge perspective differences in visual appearances and geometric configurations between these two views. In this work, we propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture, which couples CNN-based local features with Transformer-based global representations for enhanced representation lear… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…Baselines: Cross-view geo-localization (CVGL) has garnered significant research interest, resulting in several impressive works emerging in the field. To demonstrate the superiority of our proposed method, we selected 17 strong baselines and state-of-theart methods in total, i.e., Workman et al [9], Vo et al [10], Zhai et al [71], Cross-View Matching Network (CVM-Net) [11], Liu et al [31], Regmi et al [12], Spatial-Aware Feature Aggregation network (SAFA) [23], Cross-View Feature Transport technique (CVFT) [24], Dynamic Similarity Matching network (DSM) [73], Toker et al [41], Layer-to-Layer Transformer (L2LTR) [14], Local Pattern Network (LPN) [26], Unit SAFA + Subtraction Attention Module (USAM) [74], LPN + USAM [74], pure transformer-based geo-localization (Trans-Geo) [13], Transformer-Guided Convolutional Neural Network (TransGCNN) [25], and LPN + Dynamic Weighted Decorrelation Regularization (DWDR) [27]. In particular, for omnidirectional comparison, we use their recommended settings for training.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Baselines: Cross-view geo-localization (CVGL) has garnered significant research interest, resulting in several impressive works emerging in the field. To demonstrate the superiority of our proposed method, we selected 17 strong baselines and state-of-theart methods in total, i.e., Workman et al [9], Vo et al [10], Zhai et al [71], Cross-View Matching Network (CVM-Net) [11], Liu et al [31], Regmi et al [12], Spatial-Aware Feature Aggregation network (SAFA) [23], Cross-View Feature Transport technique (CVFT) [24], Dynamic Similarity Matching network (DSM) [73], Toker et al [41], Layer-to-Layer Transformer (L2LTR) [14], Local Pattern Network (LPN) [26], Unit SAFA + Subtraction Attention Module (USAM) [74], LPN + USAM [74], pure transformer-based geo-localization (Trans-Geo) [13], Transformer-Guided Convolutional Neural Network (TransGCNN) [25], and LPN + Dynamic Weighted Decorrelation Regularization (DWDR) [27]. In particular, for omnidirectional comparison, we use their recommended settings for training.…”
Section: Resultsmentioning
confidence: 99%
“…Unet-like [70] architecture comprising an encoder and a decoder has been widely used in generative tasks recently. Existing sduties [25,42] demonstrate that the attention mechanism in a transformer is excellent at modeling global contextual information, and CNN excels at encoding local semantic information. With these properties in mind, we propose a novel generative module that owns Unet-like [70] architecture and combines multi-head self-attention and convolutional layers in parallel for mutual benefit.…”
Section: Cross-view Synthesismentioning
confidence: 99%
See 1 more Smart Citation