Implicit and explicit attention mechanisms for zero-shot learning

Alamri, Faisal; Dutta, Anjan

doi:10.1016/j.neucom.2023.03.009

Cited by 8 publications

(2 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To get further insights into our model, we report the results of DUET with ViT-base (Dosovitskiy et al 2021) as the vision encoder. Remarkably, since the released ViT-base is pre-trained on ImageNet-21K which may contain unseen objects, we only select 2 recent ViTbased ZSL methods, ViT-ZSL (Alamri and Dutta 2021) and IEAM-ZSL (Narayan et al 2020), for comparison. As shown in Figure 4(b), DUET surpasses these two methods by a large margin (14.1% improvement on U and 10.3% improvement on H) and also exceeds our SOTA performance (H) by 4.8%.…”

Section: Overall Resultsmentioning

confidence: 99%

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Chen

Huang

Chen

et al. 2023

AAAI

View full text Add to dashboard Cite

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the shortage of fine-grained annotations, but also the attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that our DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark. Its components are effective and its predictions are interpretable.

show abstract

Section: Overall Resultsmentioning

confidence: 99%

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Chen

Huang

Chen

et al. 2023

AAAI

View full text Add to dashboard Cite

show abstract

“…Recently, vision-language models 4 , 37 – 41 have demonstrated impressive performance on image recognition tasks by learning generic visual representations via prompting technique. Among them, CLIP 4 is a pioneering work.…”

Section: Related Workmentioning

confidence: 99%

TEG: image theme recognition using text-embedding-guided few-shot adaptation

Wang,

Lu,

Wang

et al. 2024

J. Electron. Imag.

View full text Add to dashboard Cite

Grouping images into different themes is a challenging task in photo book curation. Unlike image object recognition, image theme recognition focuses on the understanding of the main subject or overall meaning conveyed by an image. However, it is challenging to achieve satisfactory performance using existing general image recognition methods. In this work, we aim to solve the image theme recognition task with few-shot training samples using pre-trained contrastive language-image models. A text-prompt-guided few-shot image adaptation framework is proposed, which incorporates a text-embedding-guided classifier and an auxiliary classification loss to exploit embedded visual and text features, stabilize the network training, and enhance recognition performance. We also present an annotated dataset Theme25 for studying image theme recognition. We conducted experiments on our Theme25 dataset as well as the publicly available CIFAR100 and ImageNet datasets to demonstrate the superiority of our method over the compared stateof-the-art methods.

show abstract

Generative-based hybrid model with semantic representations for generalized zero-shot learning

Akdemir,

Barisci

2024

SIViP

View full text Add to dashboard Cite

Implicit and explicit attention mechanisms for zero-shot learning

Cited by 8 publications

References 42 publications

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

TEG: image theme recognition using text-embedding-guided few-shot adaptation

Generative-based hybrid model with semantic representations for generalized zero-shot learning

Contact Info

Product

Resources

About