2023
DOI: 10.1109/lgrs.2022.3230973
|View full text |Cite
|
Sign up to set email alerts
|

TransMIN: Transformer-Guided Multi-Interaction Network for Remote Sensing Object Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 33 publications
0
3
0
Order By: Relevance
“…In recent years, the transformer [28], initially proposed for natural language processing (NLP), has garnered significant attention and has been applied in various computer vision domains [29,30]. The transformer introduces self-attention mechanisms to replace convolutional operators, thereby enabling models to better learn global contextual information from images.…”
Section: Transformer-based Methods For Road Extractionmentioning
confidence: 99%
“…In recent years, the transformer [28], initially proposed for natural language processing (NLP), has garnered significant attention and has been applied in various computer vision domains [29,30]. The transformer introduces self-attention mechanisms to replace convolutional operators, thereby enabling models to better learn global contextual information from images.…”
Section: Transformer-based Methods For Road Extractionmentioning
confidence: 99%
“…Scene text detection [129] Complex background Based on a few representative features GANsformer [130] Complex background Combining GAN and transformer Hyneter [131] Complex background HNB+DS TPH-YOLO [132] Complex background Transformer prediction head RODFormer [133] Boundary-arbitrary discontinuity A spatial-FFN feedforward network TransConvNet [134] Boundary-arbitrary discontinuity Combines CNN and self-attention AO2-DETR [135] Boundary-arbitrary discontinuity Generates explicit orientation suggestions O2DETR [136] Boundary-arbitrary discontinuity Apply Transformer to locate objects EFNet [137] Unreliable bounding boxes FCF+RCF TRD [138] Inadequate expression Aggregate multiple scales of features LPSW [139] Inadequate expression Combining transformers and CNNs TransMIN [140] Inadequate expression LGFI+CVFI ViT-YOLO [141] Inadequate expression MHSA-Darknet+BiFPN…”
Section: Problem Solved Optimization Strategiesmentioning
confidence: 99%
“…Additionally, the Swin transformer is enhanced by combining the advantages of transformers and CNNs, resulting in a locally sensing Swin transformer [139] to improve the detection accuracy of small-scale targets. For local-global feature interaction (LGFI), a transformer-guided multi-interaction network [140] (TransMIN) is proposed to learn complementary features using convolution and transformers in the residual blocks in the backbone network. Cross-view feature interaction (CVFI) is also implemented by transformers in the FPN pyramid layer to capture the correlation between reference features and pyramid features.…”
Section: Problem Solved Optimization Strategiesmentioning
confidence: 99%