2024
DOI: 10.1016/j.eswa.2023.122804
|View full text |Cite
|
Sign up to set email alerts
|

SAM-Net: Self-Attention based Feature Matching with Spatial Transformers and Knowledge Distillation

Benjamin Kelenyi,
Victor Domsa,
Levente Tamas
Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 33 publications
0
1
0
Order By: Relevance
“…The network structure is shown in Figure 5. As shown in the figure, two 1D global pooling operations are utilized to aggregate the input features along the vertical and horizontal directions, respectively, while the input feature maps come from the channel-attention-weighted feature map T se [39] and spatialattention-weighted feature map T sam [40], respectively, to form two separate directionally oriented feature maps T se, h in the H-direction and T sam, w in the W-direction sensing feature maps. These two feature maps embedded with direction-specific information are encoded as two separate attention maps, each of which captures the long-range dependencies of the input feature maps in one spatial direction, thus preserving positional information.…”
Section: Local Global Feature Coordination Enhancement Module (Lgfe)mentioning
confidence: 99%
“…The network structure is shown in Figure 5. As shown in the figure, two 1D global pooling operations are utilized to aggregate the input features along the vertical and horizontal directions, respectively, while the input feature maps come from the channel-attention-weighted feature map T se [39] and spatialattention-weighted feature map T sam [40], respectively, to form two separate directionally oriented feature maps T se, h in the H-direction and T sam, w in the W-direction sensing feature maps. These two feature maps embedded with direction-specific information are encoded as two separate attention maps, each of which captures the long-range dependencies of the input feature maps in one spatial direction, thus preserving positional information.…”
Section: Local Global Feature Coordination Enhancement Module (Lgfe)mentioning
confidence: 99%