2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00712
|View full text |Cite
|
Sign up to set email alerts
|

Relation Distillation Networks for Video Object Detection

Abstract: It has been well recognized that modeling object-toobject relations would be helpful for object detection. Nevertheless, the problem is not trivial especially when exploring the interactions between objects to boost video object detectors. The difficulty originates from the aspect that reliable object relations in a video should depend on not only the objects in the present frame but also all the supportive objects extracted over a long range span of the video. In this paper, we introduce a new design to captu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
180
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 225 publications
(181 citation statements)
references
References 46 publications
1
180
0
Order By: Relevance
“…Causal? Backbone mAP(%) mAP gain(%) T-CNN [13] No GoogLeNet + VGG + Fast-RCNN 73.8 6.1 MANet [14] No ResNet101 + R-FCN 78.1 4.5 FGFA [16] No ResNet101 + R-FCN 78.4 5.0 Scale-time lattice [20] No ResNet101+ Faster R-CNN 79.6 N/A Object linking [30] No ResNet101+ Fast R-CNN 74.5 5.4 Seq-NMS [19] No VGG + Faster R-CNN 52.2 7.3 STMN [18] No ResNet101 + R-FCN 80.5 N/A STSN [21] No ResNet101 + R-FCN 78.9 2.9 RDN [41] No ResNet101 + Faster R-CNN 81.8 6.4 SELSA [42] No ResNet101 + Faster R-CNN 80.3 6.7 D&T [15] No mance despite the fact that a less powerful detection network is used. Since our method focuses on causal video object detection where no future frames are allowed, no video-level post-processing is applied.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Causal? Backbone mAP(%) mAP gain(%) T-CNN [13] No GoogLeNet + VGG + Fast-RCNN 73.8 6.1 MANet [14] No ResNet101 + R-FCN 78.1 4.5 FGFA [16] No ResNet101 + R-FCN 78.4 5.0 Scale-time lattice [20] No ResNet101+ Faster R-CNN 79.6 N/A Object linking [30] No ResNet101+ Fast R-CNN 74.5 5.4 Seq-NMS [19] No VGG + Faster R-CNN 52.2 7.3 STMN [18] No ResNet101 + R-FCN 80.5 N/A STSN [21] No ResNet101 + R-FCN 78.9 2.9 RDN [41] No ResNet101 + Faster R-CNN 81.8 6.4 SELSA [42] No ResNet101 + Faster R-CNN 80.3 6.7 D&T [15] No mance despite the fact that a less powerful detection network is used. Since our method focuses on causal video object detection where no future frames are allowed, no video-level post-processing is applied.…”
Section: Methodsmentioning
confidence: 99%
“…Cuboid proposal network and tubelet linking algorithm are proposed in [30] to improve the performance of detecting moving objects in videos. In [41], objects' interactions are captured in spatio-temporal domain. Full-sequence level feature aggregation is proposed in [42] to generate robust features for video object detection.…”
Section: B Video Object Detectionmentioning
confidence: 99%
“…STMN [22] adopts spatiotemporal memory module with spatial alignment mechanism to model long-term temporal appearance and motion dynamics. Besides, RDN [46] and SELSA [47] strengthen region-level features by exploiting the relation/affinity between region proposals across frames…”
Section: B Object Detection In Videosmentioning
confidence: 99%
“…Relational Reasoning. There has been strong evidences on the use of relational reasoning to support various tasks, e.g., object detection [11,14,15,16], feature learning [17], vision-language [18,19]. For example, [16] plugs non-local operation into the conventional CNN to enable the pixel-level relational interaction within feature maps, and [11] presents an object relation module to model the relations of regions via the interaction among appearance features and geometry.…”
Section: Related Workmentioning
confidence: 99%