Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Application 2019
DOI: 10.5220/0007260002260233
|View full text |Cite
|
Sign up to set email alerts
|

Improving Video Object Detection by Seq-Bbox Matching

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2019
2019
2025
2025

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(18 citation statements)
references
References 13 publications
0
18
0
Order By: Relevance
“…Despite the great success of these approaches, most of the pipelines for video object detection are too sophisticated, requiring many hand-crafted components, e.g., extra optic flow model, memory mechanism, or recurrent neural network. In addition, most of them need complicated post-processing methods such as Seq-NMS [12], Tubelet rescoring [13], Seq-Bbox Matching [14] or REPP [15] For the second aspect, starting from DFF [26], several works [16], [17], [33], [36], [37], [38], [52], [53] focus on realtime video object detection while keeping accuracy unchanged or even improved. In general, most of these works also perform specific architecture design with many hand-crafted components and human prior such as object-level tracker in [16], patchwork cell with attention in [53] and Convolutional LSTMs in [37].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Despite the great success of these approaches, most of the pipelines for video object detection are too sophisticated, requiring many hand-crafted components, e.g., extra optic flow model, memory mechanism, or recurrent neural network. In addition, most of them need complicated post-processing methods such as Seq-NMS [12], Tubelet rescoring [13], Seq-Bbox Matching [14] or REPP [15] For the second aspect, starting from DFF [26], several works [16], [17], [33], [36], [37], [38], [52], [53] focus on realtime video object detection while keeping accuracy unchanged or even improved. In general, most of these works also perform specific architecture design with many hand-crafted components and human prior such as object-level tracker in [16], patchwork cell with attention in [53] and Convolutional LSTMs in [37].…”
Section: Related Workmentioning
confidence: 99%
“…Previous video object detection approaches mainly leverage temporal information in two different manners. The first one relies on post-processing of temporal information to make the object detection results more coherent and stable [12], [13], [14], [15]. These methods usually apply a still-image detector to obtain detection results, then associate the results.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The first * The first three authors contribute equally to this work. one relies on post-processing of temporal information to make the object detection results more coherent and stable [1,19,27,31,41,47]. These methods usually apply a still-image detector to obtain detection results, then associate the results.…”
Section: Introductionmentioning
confidence: 99%
“…Despite the gratifying success of these approaches, most of the two-stage pipelines for video object detection are over sophisticated, requiring many hand-crafted components, e.g., optical flow model [50,58,60,61], recurrent neural network [5,9,16], deformable convolution fusion [2,20,25], relation networks [5,11,42]. In addition, most of them need complicated post-processing methods by linking the same object across the video to form tubelets and aggregating classification scores in the tubelets to achieve the state-of-the-art performance [1,19,27,41]. Thus, it is in desperate need to build a simple yet effective VOD framework in a fully end-to-end manner.…”
Section: Introductionmentioning
confidence: 99%