2023
DOI: 10.1016/j.neucom.2023.03.009
|View full text |Cite
|
Sign up to set email alerts
|

Implicit and explicit attention mechanisms for zero-shot learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 42 publications
0
2
0
Order By: Relevance
“…To get further insights into our model, we report the results of DUET with ViT-base (Dosovitskiy et al 2021) as the vision encoder. Remarkably, since the released ViT-base is pre-trained on ImageNet-21K which may contain unseen objects, we only select 2 recent ViTbased ZSL methods, ViT-ZSL (Alamri and Dutta 2021) and IEAM-ZSL (Narayan et al 2020), for comparison. As shown in Figure 4(b), DUET surpasses these two methods by a large margin (14.1% improvement on U and 10.3% improvement on H) and also exceeds our SOTA performance (H) by 4.8%.…”
Section: Overall Resultsmentioning
confidence: 99%
“…To get further insights into our model, we report the results of DUET with ViT-base (Dosovitskiy et al 2021) as the vision encoder. Remarkably, since the released ViT-base is pre-trained on ImageNet-21K which may contain unseen objects, we only select 2 recent ViTbased ZSL methods, ViT-ZSL (Alamri and Dutta 2021) and IEAM-ZSL (Narayan et al 2020), for comparison. As shown in Figure 4(b), DUET surpasses these two methods by a large margin (14.1% improvement on U and 10.3% improvement on H) and also exceeds our SOTA performance (H) by 4.8%.…”
Section: Overall Resultsmentioning
confidence: 99%
“…Recently, vision-language models 4 , 37 41 have demonstrated impressive performance on image recognition tasks by learning generic visual representations via prompting technique. Among them, CLIP 4 is a pioneering work.…”
Section: Related Workmentioning
confidence: 99%