2018
DOI: 10.48550/arxiv.1811.10014
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Describe and Attend to Track: Learning Natural Language guided Structural Representation and Visual Attention for Object Tracking

Xiao Wang,
Chenglong Li,
Rui Yang
et al.

Abstract: The tracking-by-detection framework requires a set of positive and negative training samples to learn robust tracking models for precise localization of target objects. However, existing tracking models mostly treat different samples independently while ignores the relationship information among them. In this paper, we propose a novel structureaware deep neural network to overcome such limitations. In particular, we construct a graph to represent the pairwise relationships among training samples, and additiona… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3

Relationship

4
3

Authors

Journals

citations
Cited by 10 publications
(33 citation statements)
references
References 36 publications
0
33
0
Order By: Relevance
“…1 (a), the tracker may be confused to track the bike or lower body of the pedestrian. Similar views can also be found in [21,43,65,78]. (3) Current BBox-based trackers may perform poorly when facing abrupt appearance variation of the target object, like face/cloth changing or species variation in Fig.…”
Section: Introductionmentioning
confidence: 71%
See 3 more Smart Citations
“…1 (a), the tracker may be confused to track the bike or lower body of the pedestrian. Similar views can also be found in [21,43,65,78]. (3) Current BBox-based trackers may perform poorly when facing abrupt appearance variation of the target object, like face/cloth changing or species variation in Fig.…”
Section: Introductionmentioning
confidence: 71%
“…Lingual Specification Only; Lingual First, then Visual Specification; Lingual and Visual Specification). Wang [65] and Feng [21] also propose to use the language information to generate global proposals for tracking. Yang et al…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Results on LaSOT [46]: LaSOT is the largest dataset for long-term tracking which contains 1400 videos (Protocol-I and protocol-II employ all the videos and 280 videos for testing respectively). In this paper, we adopt the protocol-II for the evaluation of our tracker and compared trackers including CSRDCF [42], Lang-Tracker [94], ECO [4], DSiam [95], VITAL [35], THOR [80], MDNet [3], SiamRPN++ [13], Ocean [89], SiamFC++ [52], LTMU [65]. As shown in Fig.…”
Section: Comparison On Public Benchmarksmentioning
confidence: 99%