2020
DOI: 10.48550/arxiv.2003.06798
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

StarNet: towards Weakly Supervised Few-Shot Object Detection

Abstract: In this paper, we propose a new few-shot learning method called StarNet, which is an end-to-end trainable non-parametric starmodel few-shot classifier. While being meta-trained using only imagelevel class labels, StarNet learns not only to predict the class labels for each query image of a few-shot task, but also to localize (via a heatmap) what it believes to be the key image regions supporting its prediction, thus effectively detecting the instances of the novel categories. The localization is enabled by the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 37 publications
0
4
0
Order By: Relevance
“…To our best knowledge, instead of designing a distinctly new detection framework from scratch, FSOD appends the process of meta knowledge extraction and sharing on classic deep learning object detection baselines, such as two-stage Faster R-CNN [11,16,17,72,73,78,[83][84][85][86][87][88][89][90][91][92][93][94][95][96][97][98], one-stage YOLO [10,74,99], CenterNet [100,101], and Vision Transformer [102]. The numerous published reports indicate that the two-stage network is more favored, and the two-stage network has more advantages because of its higher detection accuracy, more interpretive, and extensible network structure.…”
Section: Model: Semantics Extraction and Cross-domain Mappingmentioning
confidence: 99%
“…To our best knowledge, instead of designing a distinctly new detection framework from scratch, FSOD appends the process of meta knowledge extraction and sharing on classic deep learning object detection baselines, such as two-stage Faster R-CNN [11,16,17,72,73,78,[83][84][85][86][87][88][89][90][91][92][93][94][95][96][97][98], one-stage YOLO [10,74,99], CenterNet [100,101], and Vision Transformer [102]. The numerous published reports indicate that the two-stage network is more favored, and the two-stage network has more advantages because of its higher detection accuracy, more interpretive, and extensible network structure.…”
Section: Model: Semantics Extraction and Cross-domain Mappingmentioning
confidence: 99%
“…A few works [36,11,16] have tried XAI for FSL tasks. Geng et al [11] uses a knowledge graph to make an explanation for zero-shot tasks.…”
Section: Explainable Aimentioning
confidence: 99%
“…Sun et al [36] adopts layerwise relevance propagation (LRP) [1] to explain the output of a classifier. StarNet [16] realize visualization through heat maps derived from back-project. Recently, a new type of XAI, coined SCOUTER [20], has been proposed, which applies the self-attention mechanism [38] to the classifier.…”
Section: Explainable Aimentioning
confidence: 99%
“…Enhancements to Prototypical Networks (ProtoNets) [29] in particular have been proposed for difficult few-shot tasks involving, for instance, inhomogeneous noisy datasets [8], domain adaptation [24], and relation classification in text [10]. Several works have incorporated localization conditioning into a few-shot classification architecture [12,17,18,26,32]. We include comparison with the Few-Shot Localization (FSL) method presented in [32] in our experimental evaluations.…”
Section: Introductionmentioning
confidence: 99%