EAPT: Efficient Attention Pyramid Transformer for Image Processing

Lin, Xiao; Sun, Shuzhou; Huang, Wei; Sheng, Bin; Li, Ping; Feng, David Dagan

doi:10.1109/tmm.2021.3120873

Cited by 228 publications

(46 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To solve the problem that ViT requires a large amount of computation, [17] proposes swin transformer, which introduces a shift window-based attention mechanism to reduce computational cost. [18] uses transformer for image processing. [19] uses the transformer decoder architecture for multi-label classification.…”

Section: Pedestrian Attribute Recognitionmentioning

confidence: 99%

See 1 more Smart Citation

Attribute correlation mask fusion network for pedestrian attribute recognition

Li,

Zhang,

Teng

et al. 2024

Preprint

View full text Add to dashboard Cite

The main goal of Pedestrian Attribute Recognition (PAR) is to identify various attributes of pedestrians captured in video surveillance. Due to the numerous categories of pedestrian attribute labels, the complex and easily overlooked correlations among attributes, PAR is a challenging task. Traditional methods usually treat each attribute independently, ignoring the possible intrinsic correlations between attributes.We design a pedestrian attribute recognition network ACMFNet which can fuse pedestrian attributes uniqueness features and attribute correlation features. Specifically, we propose an attribute correlation query module (ACQM), which are used to learn discriminative attribute features. Then, we construct a mask fusion module (MFM) to automatically learn the importance of the image feature and attribute correlation feature. To better distinguish the modality differences between images and attribute texts, we propose modality prompt. Experimental results show that our method can significantly enhance the network’s ability to recognize pedestrian attributes. On three pedestrian attribute recognition datasets PA100K, PETA, and UAV-Human, our proposed method shows competitive performance compared to the state-of-the-art methods. Our source code is available at \url{https://github.com/luffy-op/ACMFNet.

show abstract

Section: Pedestrian Attribute Recognitionmentioning

confidence: 99%

“…However, these methods frequently fail to capture complex interactions between different feature domains. More recent advancements have introduced more sophisticated methods, such as feature pyramid networks [18] and attention-based fusion [26], which better address the integration of multi-level and multi-modal data.…”

Section: Feature Fusionmentioning

confidence: 99%

Attribute correlation mask fusion network for pedestrian attribute recognition

Li,

Zhang,

Teng

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…CaiT [15] introduces a class-attention layer which can separate the contradictory objectives of guiding attention. In addition, there are many works [16,17,18,19,20] focusing on the application of vision transformers.…”

Section: A Vision Transformersmentioning

confidence: 99%

Quadcopter Drone for Vision-Based Autonomous Target Following

et al. 2023

View full text Add to dashboard Cite

Unmanned aerial vehicles (UAVs) are becoming popular in various applications. However, there are still challenging issues to be tackled, such as effective obstacle avoidance, target identification within a crowd, and specific target tracking. This paper focuses on dynamic target following and obstacle avoidance to realize a prototype of a quadcopter drone to serve as an autonomous object follower. An adaptive target identification system is proposed to recognize the specific target in the complicated background. For obstacle avoidance during flight, we introduce an idea of space detection and use it to develop a so-called contour and spiral convolution space detection (CASCSD) algorithm to evade obstacles. Thanks to the low architecture complexity, it is appropriate for implementation on onboard flight control systems. The target prediction is integrated with fuzzified flight control to fulfill an autonomous target tracker. When this series of technical research and development is completed, this system can be used for applications such as personal security guard and criminal detection systems.

show abstract

“…The Transformer ( Vaswani et al, 2017 ), a novel neural network, was first applied to natural language processing (NLP) tasks, such as machine translation and English constituency analysis tasks, and achieved significant improvements in results. In the field of computer vision, Transformer-based models mainly use the key module self-attention mechanism to extract intrinsic features and show great potential in artificial intelligence applications, such as high-resolution image synthesis ( Dalmaz, Yurt & Ukur, 2021 ), object detection ( Carion et al, 2020 ), classification ( Yuan et al, 2021 ), segmentation ( Zheng et al, 2021 ), image processing ( Lin et al, 2021 ), and re-identification ( Luo et al, 2020 ). Furthermore, in vision applications, CNNs have previously been considered the fundamental component, but now the transformer shows that it will be a potential replacement for CNNs.…”

Section: Introductionmentioning

confidence: 99%

S-Swin Transformer: simplified Swin Transformer model for offline handwritten Chinese character recognition

Dan

Zhu

Jin

et al. 2022

PeerJ Computer Science

View full text Add to dashboard Cite

The Transformer shows good prospects in computer vision. However, the Swin Transformer model has the disadvantage of a large number of parameters and high computational effort. To effectively solve these problems of the model, a simplified Swin Transformer (S-Swin Transformer) model was proposed in this article for handwritten Chinese character recognition. The model simplifies the initial four hierarchical stages into three hierarchical stages. In addition, the new model increases the size of the window in the window attention; the number of patches in the window is larger; and the perceptual field of the window is increased. As the network model deepens, the size of patches becomes larger, and the perceived range of each patch increases. Meanwhile, the purpose of shifting the window’s attention is to enhance the information interaction between the window and the window. Experimental results show that the verification accuracy improves slightly as the window becomes larger. The best validation accuracy of the simplified Swin Transformer model on the dataset reached 95.70%. The number of parameters is only 8.69 million, and FLOPs are 2.90G, which greatly reduces the number of parameters and computation of the model and proves the correctness and validity of the proposed model.

show abstract

EAPT: Efficient Attention Pyramid Transformer for Image Processing

Cited by 228 publications

References 44 publications

Attribute correlation mask fusion network for pedestrian attribute recognition

Attribute correlation mask fusion network for pedestrian attribute recognition

Quadcopter Drone for Vision-Based Autonomous Target Following

S-Swin Transformer: simplified Swin Transformer model for offline handwritten Chinese character recognition

Contact Info

Product

Resources

About