2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) 2019
DOI: 10.1109/iccvw.2019.00280
|View full text |Cite
|
Sign up to set email alerts
|

Visual Object Tracking by Using Ranking Loss

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 38 publications
0
2
0
Order By: Relevance
“…Therefore, any task requiring to process continuous video streams can benefit from D3D including action/activity recognition with datasets UCF-101 [38], HMDB [26], Kinetics [3]; video object detection task such as ImageNet VID [34]; spatiotemporal action localization task such as Atomic Visual Actions (AVA) dataset [10]; video object tracking (VOT) task such as [24,45]; multi-object tracking (MOT) such as [7]; video person re-identification task such as MARS (Motion Analysis and Re-identification Set) dataset [52]; gait recognition task such as Casia-B dataset [50]; video face recognition such as YouTube Faces [43] and many other tasks. Currently, state-of-the-art architectures either use only spatial content by processing the input frame-by-frame ignoring the temporal content [4,42,2] or utilize offline-trained 3D CNN architectures in a non-dynamic way [20,1]. By utilizing our proposed D3D architecture, all video-based computer vision tasks can incorporate temporal information.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, any task requiring to process continuous video streams can benefit from D3D including action/activity recognition with datasets UCF-101 [38], HMDB [26], Kinetics [3]; video object detection task such as ImageNet VID [34]; spatiotemporal action localization task such as Atomic Visual Actions (AVA) dataset [10]; video object tracking (VOT) task such as [24,45]; multi-object tracking (MOT) such as [7]; video person re-identification task such as MARS (Motion Analysis and Re-identification Set) dataset [52]; gait recognition task such as Casia-B dataset [50]; video face recognition such as YouTube Faces [43] and many other tasks. Currently, state-of-the-art architectures either use only spatial content by processing the input frame-by-frame ignoring the temporal content [4,42,2] or utilize offline-trained 3D CNN architectures in a non-dynamic way [20,1]. By utilizing our proposed D3D architecture, all video-based computer vision tasks can incorporate temporal information.…”
Section: Related Workmentioning
confidence: 99%
“…A fast version of MDNet tracker, called Real-Time MDNet, [32] used Fast R-CNN [20] approach to accelerate the slow feature extraction stage of the MDNet method. Cevikalp et al [7] proposed a deep neural network tracker using ranking loss which enforces the network to return better bounding boxes framing the target object. For the same purpose, both [11,3] used novel deep neural tracking architectures that utilize IoU-Net [30] whose goal is to estimate and increase the Intersection over Union (IoU) overlap between the target an estimated bounding box to improve the accuracy.…”
Section: Related Workmentioning
confidence: 99%