2023
DOI: 10.1109/tmm.2021.3120873
|View full text |Cite
|
Sign up to set email alerts
|

EAPT: Efficient Attention Pyramid Transformer for Image Processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 228 publications
(46 citation statements)
references
References 44 publications
0
46
0
Order By: Relevance
“…To solve the problem that ViT requires a large amount of computation, [17] proposes swin transformer, which introduces a shift window-based attention mechanism to reduce computational cost. [18] uses transformer for image processing. [19] uses the transformer decoder architecture for multi-label classification.…”
Section: Pedestrian Attribute Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…To solve the problem that ViT requires a large amount of computation, [17] proposes swin transformer, which introduces a shift window-based attention mechanism to reduce computational cost. [18] uses transformer for image processing. [19] uses the transformer decoder architecture for multi-label classification.…”
Section: Pedestrian Attribute Recognitionmentioning
confidence: 99%
“…However, these methods frequently fail to capture complex interactions between different feature domains. More recent advancements have introduced more sophisticated methods, such as feature pyramid networks [18] and attention-based fusion [26], which better address the integration of multi-level and multi-modal data.…”
Section: Feature Fusionmentioning
confidence: 99%
“…CaiT [15] introduces a class-attention layer which can separate the contradictory objectives of guiding attention. In addition, there are many works [16,17,18,19,20] focusing on the application of vision transformers.…”
Section: A Vision Transformersmentioning
confidence: 99%
“…The Transformer ( Vaswani et al, 2017 ), a novel neural network, was first applied to natural language processing (NLP) tasks, such as machine translation and English constituency analysis tasks, and achieved significant improvements in results. In the field of computer vision, Transformer-based models mainly use the key module self-attention mechanism to extract intrinsic features and show great potential in artificial intelligence applications, such as high-resolution image synthesis ( Dalmaz, Yurt & Ukur, 2021 ), object detection ( Carion et al, 2020 ), classification ( Yuan et al, 2021 ), segmentation ( Zheng et al, 2021 ), image processing ( Lin et al, 2021 ), and re-identification ( Luo et al, 2020 ). Furthermore, in vision applications, CNNs have previously been considered the fundamental component, but now the transformer shows that it will be a potential replacement for CNNs.…”
Section: Introductionmentioning
confidence: 99%