2018
DOI: 10.1109/tpami.2017.2745563
|View full text |Cite
|
Sign up to set email alerts
|

Crafting GBD-Net for Object Detection

Abstract: The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. Effective integration of local and contextual visual cues from these regions has become a fundamental problem in object detection. In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass messages among features from different support regions during both feature learning and feature extraction. Such message passing can be implemented through convol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
93
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3
1

Relationship

3
6

Authors

Journals

citations
Cited by 125 publications
(93 citation statements)
references
References 48 publications
0
93
0
Order By: Relevance
“…It has been recognized that contextual information (object relations, global scene statistics) helps object detection and recognition [197], especially for small objects, occluded objects, and with poor image quality. There was extensive work preceding deep learning [185,193,220,58,78], and also quite a few works in the era of deep learning [82,304,305,35,114]. How to efficiently and effectively incorporate contextual information remains to be explored, possibly guided by how human vision uses context, based on scene graphs [161], or via the full segmentation of objects and scenes using panoptic segmentation [134].…”
Section: Summary and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…It has been recognized that contextual information (object relations, global scene statistics) helps object detection and recognition [197], especially for small objects, occluded objects, and with poor image quality. There was extensive work preceding deep learning [185,193,220,58,78], and also quite a few works in the era of deep learning [82,304,305,35,114]. How to efficiently and effectively incorporate contextual information remains to be explored, possibly guided by how human vision uses context, based on scene graphs [161], or via the full segmentation of objects and scenes using panoptic segmentation [134].…”
Section: Summary and Discussionmentioning
confidence: 99%
“…Fig. 18 Representative approaches that explore local surrounding contextual features: MRCNN [82], GBDNet [304,305], ACCNN [157] and CoupleNet [327]; also see Table 8.…”
Section: Local Contextmentioning
confidence: 99%
“…B. Implementation details 1) Training schemes and setting: For visual-displacement and visual-similarity CNNs, we adopt ResNet-101 [26], [32] as the network structure and replace the topmost layer to output displacement confidence or same-object confidence. Both CNN are pretrained on the ImageNet dataset.…”
Section: A Datasets and Evaluation Metricmentioning
confidence: 99%
“…Ensemble learning is often used to boost the results, as observed in last competition works [3,21,24,33,34]. We implement an ensemble of our temporal model, by learning several times the same model with different initializations, and averaging their predictions.…”
Section: Ensemble Learningmentioning
confidence: 99%