2022
DOI: 10.3390/rs14246354
|View full text |Cite
|
Sign up to set email alerts
|

An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention

Abstract: Current multi-target multi-camera tracking algorithms demand increased requirements for re-identification accuracy and tracking reliability. This study proposed an improved end-to-end multi-target tracking algorithm that adapts to multi-view multi-scale scenes based on the self-attentive mechanism of the transformer’s encoder–decoder structure. A multi-dimensional feature extraction backbone network was combined with a self-built raster semantic map which was stored in the encoder for correlation and generated… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 44 publications
0
1
0
Order By: Relevance
“…CNNbased multi-object tracking algorithm have limitations due to their reliance on local perception, difficulty in modeling long-term dependencies and capturing global features, and the negative impact of spatial invariance and pooling operations on tracking accuracy. Transformer has shown great success in various vision tasks due to its ability to capture global features and long-term modeling [32]- [36]. Therefore, using only CNN or Transformer as the detector of the tracking network cannot adequately capture global and local features [51], [52], especially in the UAV scenario.…”
Section: Introductionmentioning
confidence: 99%
“…CNNbased multi-object tracking algorithm have limitations due to their reliance on local perception, difficulty in modeling long-term dependencies and capturing global features, and the negative impact of spatial invariance and pooling operations on tracking accuracy. Transformer has shown great success in various vision tasks due to its ability to capture global features and long-term modeling [32]- [36]. Therefore, using only CNN or Transformer as the detector of the tracking network cannot adequately capture global and local features [51], [52], especially in the UAV scenario.…”
Section: Introductionmentioning
confidence: 99%