2022
DOI: 10.1109/tcsvt.2021.3135013
|View full text |Cite
|
Sign up to set email alerts
|

A Transformer-Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
45
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 90 publications
(46 citation statements)
references
References 47 publications
1
45
0
Order By: Relevance
“…The Detection Transformer (DEtection TRansformer, DE-TR) [40], [41] with an ensemble global loss that makes predictions through bilateral match and a classical encoderdecoder architecture, which containing three components: a CNN based backbone to extracte feature representations, a Transformer pretraining model to enhance features, and a simple feedforward network (FFN) for performing the object detection prediction.The detail structure is shown as Figure 3. Starting from an initial image x img ∈ R 3×H0×W0 (3 color channels, To batch the input images together with sufficient 0 padding to have the same dimension (H 0 ,W 0 ) as the largest image in same batch), a convolutional network then to generate a activation map f ∈ R C×H×W with lower resolution.…”
Section: A Real-time Target Detection Based On Transformermentioning
confidence: 99%
See 1 more Smart Citation
“…The Detection Transformer (DEtection TRansformer, DE-TR) [40], [41] with an ensemble global loss that makes predictions through bilateral match and a classical encoderdecoder architecture, which containing three components: a CNN based backbone to extracte feature representations, a Transformer pretraining model to enhance features, and a simple feedforward network (FFN) for performing the object detection prediction.The detail structure is shown as Figure 3. Starting from an initial image x img ∈ R 3×H0×W0 (3 color channels, To batch the input images together with sufficient 0 padding to have the same dimension (H 0 ,W 0 ) as the largest image in same batch), a convolutional network then to generate a activation map f ∈ R C×H×W with lower resolution.…”
Section: A Real-time Target Detection Based On Transformermentioning
confidence: 99%
“…1) Construct the image pyramid, at the same time extract the FAST corner points for each pyamid layer using a uniform extraction strategy based on quadtree [41], the specific calculation process is described as follows:…”
Section: B Orb Feature Extractionmentioning
confidence: 99%
“…GeoNet [33] learned powerful intermediate feature maps and allowed the stable propagation of gradients in deep CNNs and utilizes the capsule network to encapsulate the intermediate feature maps into several capsules. FSRA [34] introduced a simple and efficient transformer-based structure to enhance the ability of the model to understand contextual information as well as to understand the distribution of instances.…”
Section: B Deeply-learned Geo-localizatioinmentioning
confidence: 99%
“…Zheng et al [19] establish the first drone-based multi-source cross-view matching dataset, University-1652, which contains three views: street view, drone, and satellite, and it also publishes a baseline by designing a multibranch CNN network. [20]- [24]conduct a more in-depth study of University-1652 and significantly improve the accuracy of the matching system. However, University-1652 still has the following problems: 1.University-1652 uses synthetic images of drone views, which lack real-world lighting variations.…”
Section: Introductionmentioning
confidence: 98%