2020
DOI: 10.1007/s11263-020-01355-6
|View full text |Cite
|
Sign up to set email alerts
|

Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
16
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 55 publications
(16 citation statements)
references
References 53 publications
0
16
0
Order By: Relevance
“…Zero-shot and open-vocabulary object detection. Zeroshot object detection aims at detecting novel object classes which are not seen during detector training [2,18,38,39,59,64]. Bansal et al [2] learned to match the visual features of cropped image regions to word embeddings using max-margin loss.…”
Section: Related Workmentioning
confidence: 99%
“…Zero-shot and open-vocabulary object detection. Zeroshot object detection aims at detecting novel object classes which are not seen during detector training [2,18,38,39,59,64]. Bansal et al [2] learned to match the visual features of cropped image regions to word embeddings using max-margin loss.…”
Section: Related Workmentioning
confidence: 99%
“…In zero-shot classification [28,29] and recognition [30,31,32,33], word embeddings commonly replace learnable class prototypes to transfer from training classes to unseen classes using inherit semantic relationships extracted from text corpora. Commonly used word embeddings are GloVe vectors [29,30] and word2vec embeddings [31,34,33,35], however embeddings learnt from image-text pairs using the CLIP [36] achieve the best zero-shot performance so far [32]. Rahman et al [31] argue that a single word embedding per class is insufficient to model the visual-semantic relationships and propose to learn class representations of weighted word embeddings of synonyms and related terms [31].…”
Section: Knowledge-based Embeddingsmentioning
confidence: 99%
“…Nevertheless, pure text embeddings perform consistently best for training classes [30,31,32,33,35,32] in object detection. The projection from visual to semantic space is done by a linear layer [30,37,35], a single [31,34,33] or two-layer MLP [32], and learned with a max-margin losses [30,31,38,37], softplus-margin focal loss [35], or crossentropy loss [33,32]. Zhang et.…”
Section: Knowledge-based Embeddingsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper, we perform both transductive and inductive ZSL and GZSL on 3D point cloud objects. Zero-Shot Learning: For the ZSL task, there has been significant progress, including on image recognition [43,74,1,3,37,25,65], multi-label ZSL [26,42], and zero-shot detection [44]. Despite this progress, these methods solve the constrained problem where the test instances are restricted to only unseen classes, rather than being from either seen or unseen classes.…”
Section: Related Workmentioning
confidence: 99%