Zero-Shot Object Detection by Hybrid Region Embedding

Demirel, Berkan; Cinbiş, Ramazan Gökberk; Ikizler-Cinbis, Nazli

doi:10.48550/arxiv.1805.06157

Cited by 5 publications

(12 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They also propose a generalization version of ZSD called generalized zero-shot object detection (GZSD) which aims to detect seen and unseen objects together. Demirel et al [6] adopt the hybrid region embedding to improve performance. Zhu et al [8] introduce ZS-YOLO, which is built on a one-step YOLOv2 [48] detector.…”

Section: Related Workmentioning

confidence: 99%

“…Training Process. Compared with previous achievements [5,6,7,8] needing multi-step training and pre-trained weights on seen or unseen data, the training process of our model is very simple and convenient with a two step manner. Loss Function.…”

Section: Learningmentioning

confidence: 99%

“…To simultaneously localize and recognize unseen objects, some preliminary attempts [5,6,7,8] for zero-shot object detection (ZSD) have been reported. ZSD introduces a more practical setting to detect novel objects that are not observed during training.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Background Learnable Cascade for Zero-Shot Object Detection

Zheng,

Huang,

Han

et al. 2020

Preprint

View full text Add to dashboard Cite

Zero-shot detection (ZSD) is crucial to large-scale object detection with the aim of simultaneously localizing and recognizing unseen objects. There remain several challenges for ZSD, including reducing the ambiguity between background and unseen objects as well as improving the alignment between visual and semantic concept. In this work, we propose a novel framework named Background Learnable Cascade (BLC) to improve ZSD performance. The major contributions for BLC are as follows: (i) we propose a multi-stage cascade structure named Cascade Semantic R-CNN to progressively refine the alignment between visual and semantic of ZSD; (ii) we develop the semantic information flow structure and directly add it between each stage in Cascade Semantic R-CNN to further improve the semantic feature learning; (iii) we propose the background learnable region proposal network (BLRPN) to learn an appropriate word vector for background class and use this learned vector in Cascade Semantic R-CNN, this design makes "Background Learnable" and reduces the confusion between background and unseen classes. Our extensive experiments show BLC obtains significantly performance improvements for MS-COCO over state-of-the-art methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Learningmentioning

confidence: 99%

See 1 more Smart Citation

Background Learnable Cascade for Zero-Shot Object Detection

Zheng,

Huang,

Han

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The authors in [3] incorporate an improved semantic mapping for the background in an iterative manner by first projecting the seen class visual features to their corresponding semantics and then the background bounding boxes to a set of diverse unseen semantic vectors. [4] learns an embedding space as a convex combination of training class wordvecs. [5] uses a Recurrent Neural Network to model natural language description of objects in the image.…”

Section: Related Workmentioning

confidence: 99%

“…ZSD is commonly accomplished by learning to project visual representations of different objects to a pre-defined semantic embedding space, and then performing nearest neighbor search in the semantic space at inference [2,3,4,5]. Since the unseen examples are never visualized during training, the model gets significantly biased towards the seen objects [6,7], leading to problems such as confusion with background and mode collapse resulting in high scores for only some unseen classes.…”

Section: Introductionmentioning

confidence: 99%

Synthesizing the Unseen for Zero-shot Object Detection

Hayat¹,

Hayat²,

Rahman³

et al. 2020

Preprint

View full text Add to dashboard Cite

The existing zero-shot detection approaches project visual features to the semantic domain for seen objects, hoping to map unseen objects to their corresponding semantics during inference. However, since the unseen objects are never visualized during training, the detection model is skewed towards seen content, thereby labeling unseen as background or a seen class. In this work, we propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. Consequently, the major challenge becomes, how to accurately synthesize unseen objects merely using their class semantics? Towards this ambitious goal, we propose a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them. Further, using a unified model, we ensure the synthesized features have high diversity that represents the intra-class differences and variable localization precision in the detected bounding boxes. We test our approach on three object detection benchmarks, PASCAL VOC, MSCOCO, and ILSVRC detection, under both conventional and generalized settings, showing impressive gains over the state-of-the-art methods. Our codes are available at https://github.com/nasir6/zero_shot_detection

show abstract

Adaptive adjustment with semantic embedding for zero-shot object detection

Shi

Tan

et al. 2023

J. Electron. Imag.

View full text Add to dashboard Cite

.Traditional zero-shot object-detection algorithms detect images of untrained classes in the model with the help of semantic embedding. However, these approaches may perform poorly due to the limitations of fixed semantic embedding. Given that fixed semantic attributes lead to a lack of generalization capabilities in the model, a semantic enhancement mechanism is proposed to update the semantic embedding, which is used to serve the needs of the visual space. Specifically, considering that the original semantic space is not enough to construct a visual-semantic mapping relationship, an augmented semantic embedding (ASE) approach is designed to supplement semantic attribute information. Then, a semantic channel attention mechanism is used to adjust the ASE. The adjustment strategy retains adequate attribute information, which is highly relevant to visual features. Finally, to alleviate the domain shift problem, a clustering association strategy is introduced to establish an inferred relationship, which ensures that the predictor is generalized to the unseen domain during training. The superiority of the proposed method is demonstrated by the MS-COCO and PASCAL VOC datasets.

show abstract

Zero-Shot Object Detection by Hybrid Region Embedding

Cited by 5 publications

References 37 publications

Background Learnable Cascade for Zero-Shot Object Detection

Background Learnable Cascade for Zero-Shot Object Detection

Synthesizing the Unseen for Zero-shot Object Detection

Adaptive adjustment with semantic embedding for zero-shot object detection

Contact Info

Product

Resources

About