2015
DOI: 10.1016/j.patcog.2015.02.012
|View full text |Cite
|
Sign up to set email alerts
|

Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

Abstract: Visual representation is crucial for a visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant representations from tracked sequential image patches, via strong temporal slowness constraint and stacked convolutional autoencoders. The deep slow local representa… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
24
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 37 publications
(24 citation statements)
references
References 43 publications
0
24
0
Order By: Relevance
“…For complex applications, however, imbedding proposed models in current security systems becomes necessary, such as compressive sensing for sparse tracking [18] (it can be improved as locally compressive sensing within ROI), VIBE algorithm for real-time object detection from a moving camera [19], Adaboost algorithm for noise-detection in ROI [20], optical flow for robots' recognition of environments [21], SVM clustering for accidents classification [22], deep learning algorithms for anomaly detection, crow analysis, and hierarchical tracking within ROI [23][24][25][26][27]. Objects understanding and detection in dynamic environment changes are usually based on the adaptive background subtraction and other objects recognition methods [17,21,35,[65][66][67][68].…”
Section: Simulation and Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…For complex applications, however, imbedding proposed models in current security systems becomes necessary, such as compressive sensing for sparse tracking [18] (it can be improved as locally compressive sensing within ROI), VIBE algorithm for real-time object detection from a moving camera [19], Adaboost algorithm for noise-detection in ROI [20], optical flow for robots' recognition of environments [21], SVM clustering for accidents classification [22], deep learning algorithms for anomaly detection, crow analysis, and hierarchical tracking within ROI [23][24][25][26][27]. Objects understanding and detection in dynamic environment changes are usually based on the adaptive background subtraction and other objects recognition methods [17,21,35,[65][66][67][68].…”
Section: Simulation and Discussionmentioning
confidence: 99%
“…Numerous algorithms have been developed to tackle video recognition challenges in various environments; however, a full understanding of environmental implications to video recognition efficiency demands learning models with universal significance (ignoring uncontrolled differences in real scenarios) [18][19][20][21][22][23][24][25][26][27]. That is the essential reason why the current online algorithms, even for latest algorithms, for example, the latest models for tackling crowd segmentation for the high-dimensional, large-scale anomaly detection, still encounter considerable uncertainties [23,24].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Indeed, it is more effective to exploit target specific representations through a learning process rather than using a fixed set of pre-defined features [178]. Deep neural networks, especially convolutional neural networks (CNN) have recently proposed for moving object tracking which effectively use category specific features for tracking and show some promising results even in the case of complex moving camera [179,180,181].…”
Section: Extract Target Featuresmentioning
confidence: 99%
“…Although the currently used hand-crafted features produce acceptable tracking results, it is always preferred to leverage more descriptive features. Therefore, it is more beneficial to exploit target-specific representations through a learning process rather than using a fixed set of pre-defined features [23].…”
Section: Introductionmentioning
confidence: 99%