Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413583
|View full text |Cite
|
Sign up to set email alerts
|

Dual Semantic Fusion Network for Video Object Detection

Abstract: Video object detection is a tough task due to the deteriorated quality of video sequences captured under complex environments. Currently, this area is dominated by a series of feature enhancement based methods, which distill beneficial semantic information from multiple frames and generate enhanced features through fusing the distilled information. However, the distillation and fusion operations are usually performed at either frame level or instance level with external guidance using additional information, s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 24 publications
(11 citation statements)
references
References 56 publications
0
11
0
Order By: Relevance
“…For the first aspect, most previous works [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [27], [28], [29] to amend this problem is feature aggregation that enhances per-frame features by aggregating the features of nearby frames. Earlier works adopt flow-based warping to achieve feature aggregation.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…For the first aspect, most previous works [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [27], [28], [29] to amend this problem is feature aggregation that enhances per-frame features by aggregating the features of nearby frames. Earlier works adopt flow-based warping to achieve feature aggregation.…”
Section: Related Workmentioning
confidence: 99%
“…Faster RCNN 80.6 LRTR [35] Faster RCNN 81.0 RDN [34] Faster RCNN 81.8 TROI [72] Faster RCNN 82.0 MEGA [23] Faster RCNN 82.9 HVRNet [18] Faster RCNN 83.2 TF-Blender [73] Faster RCNN 83.8 DSFNet [20] Faster RCNN 84.1 MAMBA [24] Faster RCNN 84.6 EBFA [19] Faster RCNN 84.8 CFA-Net [74] Faster…”
Section: Minet [?]mentioning
confidence: 99%
See 1 more Smart Citation
“…However, extending these methods to VIS remains a challenging work. Similar to other video-based recognition tasks, such as video object segmentation (VOS) [49,50,33], video object detection (VOD) [53,37] and multi-object tracking (MOT) [16,21,59,75], continuous video sequences always bring great challenges, e.g., a huge number of frames required to be fast recognized, heavy occlusion, object disappearing and unconventional object-to-camera poses [18].…”
Section: Introductionmentioning
confidence: 99%
“…This task extends the traditional instance segmentation to the temporal domain and requires detecting, classifying, segmenting, and tracking visual instances simultaneously in the given videos. Similar to other video based tasks like Video Object Segmentation [12,25] and Video Object Detection [14], video instance segmentation provides a natural understanding of video scenes. Achieving accurate and robust video instance segmentation in realworld scenarios can greatly promote the development of video analysis.…”
Section: Introductionmentioning
confidence: 99%