2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00678
|View full text |Cite
|
Sign up to set email alerts
|

Object Guided External Memory Network for Video Object Detection

Abstract: Video object detection is more challenging than image object detection because of the deteriorated frame quality. To enhance the feature representation, state-of-the-art methods propagate temporal information into the deteriorated frame by aligning and aggregating entire feature maps from multiple nearby frames. However, restricted by feature map's low storage-efficiency and vulnerable contentaddress allocation, long-term temporal information is not fully stressed by these methods. In this work, we propose the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
63
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 115 publications
(81 citation statements)
references
References 28 publications
0
63
0
Order By: Relevance
“…Unlike static images, videos have rich temporal information. In order to benefit from the temporal clues in videos, researchers developed several methods to aggregate information locally and globally using two or more frames [46], [47], [48], [49]. Similarly, we extend HoughNet with a new temporal voting module to incorporate temporal information using an additional (auxiliary) frame.…”
Section: Spatio-temporal Votingmentioning
confidence: 99%
“…Unlike static images, videos have rich temporal information. In order to benefit from the temporal clues in videos, researchers developed several methods to aggregate information locally and globally using two or more frames [46], [47], [48], [49]. Similarly, we extend HoughNet with a new temporal voting module to incorporate temporal information using an additional (auxiliary) frame.…”
Section: Spatio-temporal Votingmentioning
confidence: 99%
“…Furthermore, external memories can benefit the long-term information storage, which can be useful for feature aggregation in the video domain [58], [59]. Besides, some techniques integrate detection trackers to exploit temporal information between keyframe processing [60]- [62].…”
Section: B Feature Aggregation Over Timementioning
confidence: 99%
“…Full-sequence level feature aggregation is proposed in [42] to generate robust features for video object detection. External memory is used in [44] to store informative temporal features. In [43], speed-accuracy tradeoff for video object detection is studied.…”
Section: B Video Object Detectionmentioning
confidence: 99%