Crafting GBD-Net for Object Detection

Zeng, Xingyu; Ouyang, Wanli; Yan, Junjie; Li, Hongsheng; Xiao, Tong; Wang, Kun; Liu, Yu; Zhou, Yucong; Yang, Bin; Wang, Zhe; Zhou, Hui; Wang, Xiaogang

doi:10.1109/tpami.2017.2745563

Cited by 125 publications

(93 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has been recognized that contextual information (object relations, global scene statistics) helps object detection and recognition [197], especially for small objects, occluded objects, and with poor image quality. There was extensive work preceding deep learning [185,193,220,58,78], and also quite a few works in the era of deep learning [82,304,305,35,114]. How to efficiently and effectively incorporate contextual information remains to be explored, possibly guided by how human vision uses context, based on scene graphs [161], or via the full segmentation of objects and scenes using panoptic segmentation [134].…”

Section: Summary and Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning for Generic Object Detection: A Survey

et al. 2019

Self Cite

View full text Add to dashboard Cite

Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

show abstract

Section: Summary and Discussionmentioning

confidence: 99%

“…Fig. 18 Representative approaches that explore local surrounding contextual features: MRCNN [82], GBDNet [304,305], ACCNN [157] and CoupleNet [327]; also see Table 8.…”

Section: Local Contextmentioning

confidence: 99%

Deep Learning for Generic Object Detection: A Survey

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…B. Implementation details 1) Training schemes and setting: For visual-displacement and visual-similarity CNNs, we adopt ResNet-101 [26], [32] as the network structure and replace the topmost layer to output displacement confidence or same-object confidence. Both CNN are pretrained on the ImageNet dataset.…”

Section: A Datasets and Evaluation Metricmentioning

confidence: 99%

Deep Continuous Conditional Random Fields With Asymmetric Inter-Object Constraints for Online Multi-Object Tracking

Zhou

Ouyang

Cheng

et al. 2019

IEEE Trans. Circuits Syst. Video Technol.

Self Cite

View full text Add to dashboard Cite

Online Multi-Object Tracking (MOT) is a challenging problem and has many important applications including intelligence surveillance, robot navigation and autonomous driving. In existing MOT methods, individual object's movements and inter-object relations are mostly modeled separately and relations between them are still manually tuned. In addition, inter-object relations are mostly modeled in a symmetric way, which we argue is not an optimal setting. To tackle those difficulties, in this paper, we propose a Deep Continuous Conditional Random Field (DCCRF) for solving the online MOT problem in a track-bydetection framework. The DCCRF consists of unary and pairwise terms. The unary terms estimate tracked objects' displacements across time based on visual appearance information. They are modeled as deep Convolution Neural Networks, which are able to learn discriminative visual features for tracklet association. The asymmetric pairwise terms model inter-object relations in an asymmetric way, which encourages high-confidence tracklets to help correct errors of low-confidence tracklets and not to be affected by low-confidence ones much. The DCCRF is trained in an end-to-end manner for better adapting the influences of visual information as well as inter-object relations. Extensive experimental comparisons with state-of-the-arts as well as detailed component analysis of our proposed DCCRF on two public benchmarks demonstrate the effectiveness of our proposed MOT framework.

show abstract

“…Ensemble learning is often used to boost the results, as observed in last competition works [3,21,24,33,34]. We implement an ensemble of our temporal model, by learning several times the same model with different initializations, and averaging their predictions.…”

Section: Ensemble Learningmentioning

confidence: 99%

An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets

Vielzeuf

Kervadec

Pateux

et al. 2018

Proceedings of the 20th ACM International Conference on Multimodal Interaction

View full text Add to dashboard Cite

This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest learning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of the representations. ii) The visual temporal information is handled by a simple score-per-frame selection process, averaged across time. iii) A simple frame selection mechanism is also proposed to weight the images of a sequence. iv) The fusion of the different modalities is performed at prediction level (late fusion). We also highlight the inherent challenges of the AFEW dataset and the difficulty of model selection with as few as 383 validation sequences. The proposed real-time emotion classifier achieved a state-of-the-art accuracy of 60.64 % on the test set of AFEW, and ranked 4th at the Emotion in the Wild 2018 challenge.

show abstract

Crafting GBD-Net for Object Detection

Cited by 125 publications

References 48 publications

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Deep Continuous Conditional Random Fields With Asymmetric Inter-Object Constraints for Online Multi-Object Tracking

An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets

Contact Info

Product

Resources

About