2021
DOI: 10.1007/s11633-020-1274-8
|View full text |Cite
|
Sign up to set email alerts
|

Advances in Deep Learning Methods for Visual Tracking: Literature Review and Fundamentals

Abstract: Recently, deep learning has achieved great success in visual tracking tasks, particularly in single-object tracking. This paper provides a comprehensive review of state-of-the-art single-object tracking algorithms based on deep learning. First, we introduce basic knowledge of deep visual tracking, including fundamental concepts, existing algorithms, and previous reviews. Second, we briefly review existing deep learning methods by categorizing them into data-invariant and data-adaptive methods based on whether … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(3 citation statements)
references
References 185 publications
0
1
0
Order By: Relevance
“…The core objective of DL research is to develop systems capable of learning intricate patterns from data and executing tasks with minimal human intervention. This spectrum of tasks encompasses diverse applications, ranging from automatic speech recognition [91] and multilingual text translation [105] to object tracking in videos [146] and analysis of medical imaging for disease diagnosis [85].…”
Section: Deep Learningmentioning
confidence: 99%
“…The core objective of DL research is to develop systems capable of learning intricate patterns from data and executing tasks with minimal human intervention. This spectrum of tasks encompasses diverse applications, ranging from automatic speech recognition [91] and multilingual text translation [105] to object tracking in videos [146] and analysis of medical imaging for disease diagnosis [85].…”
Section: Deep Learningmentioning
confidence: 99%
“…Yang et al [134] proposed a cross-modal relationship extractor (CMRE) to adaptively highlight objects and relationships with a cross-modal attention mechanism, and represented the extracted information as a language-guided visual relation graph. Furthermore, Yang et al [22] proposed a cross-modal relationship extractor to adaptively highlight objects and relationships (spatial and semantic relations) related to the given expression with a cross-modal attention mechanism, and represent the extracted information as a language-guided visual relation graph. Yang et al [150] proposed a scene graph-guided modular network (SGMN), which performed reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression.…”
Section: Visual Representation Learning: Stateof-the-artmentioning
confidence: 99%
“…Object tracking is constantly determining a moving objectʹs trajectory from measurements taken by one or more sensors [1]. Single-object tracking (SOT) [2] and Multi-object tracking (MOT) [3][4][5][6][7] are two main categories of object tracking methods (MOT). When using SOT, the tracker follows a single, predetermined object.…”
Section: Introductionmentioning
confidence: 99%