2021
DOI: 10.1109/access.2021.3087168
|View full text |Cite
|
Sign up to set email alerts
|

Appearance Guidance Attention for Multi-Object Tracking

Abstract: Appearance information is one of the most important matching indicators for multi-object data association. In tracking by detection model, appearance information and detection information are usually integrated in the same sub-network for learning and output. This phenomenon will result in the appearance embedding vectors to be coupled to the network inference method during learning stage. As a result, the appearance embedding vectors contains too much background information and affects the accuracy of data as… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 47 publications
0
1
0
Order By: Relevance
“…Filtering of detections is done using maximum suppression with a cost definition dependent on track and detection confidence. Chen et al [19] also employ a trained Siamese network for historical appearancebased matching in addition to motion and size-based matching, which special aids in lowering identity shifts during tracking are encountered. To calculate the total pair-wise cost between each detection and track, Song et al [20] uses two deep networks: the spatial attention network (which uses a siamese architecture to compare detection-bounding boxes and track history) and the temporal attention network (which uses an-longsort term memory (LSTM) architecture).…”
Section: Introductionmentioning
confidence: 99%
“…Filtering of detections is done using maximum suppression with a cost definition dependent on track and detection confidence. Chen et al [19] also employ a trained Siamese network for historical appearancebased matching in addition to motion and size-based matching, which special aids in lowering identity shifts during tracking are encountered. To calculate the total pair-wise cost between each detection and track, Song et al [20] uses two deep networks: the spatial attention network (which uses a siamese architecture to compare detection-bounding boxes and track history) and the temporal attention network (which uses an-longsort term memory (LSTM) architecture).…”
Section: Introductionmentioning
confidence: 99%