Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413503
|View full text |Cite
|
Sign up to set email alerts
|

Learning Modality-Invariant Latent Representations for Generalized Zero-shot Learning

Abstract: Recently, feature generating methods have been successfully applied to zero-shot learning (ZSL). However, most previous approaches only generate visual representations for zero-shot recognition. In fact, typical ZSL is a classic multi-modal learning protocol which consists of a visual space and a semantic space. In this paper, therefore, we present a new method which can simultaneously generate both visual representations and semantic representations so that the essential multi-modal information associated wit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
2

Relationship

3
7

Authors

Journals

citations
Cited by 37 publications
(16 citation statements)
references
References 34 publications
1
15
0
Order By: Relevance
“…CADA-VAE (Schönfeld et al 2019) combines cross-modal and cross-reconstruction, and leverages two independent VAEs to align features via minimizing distance between the latent distributions of visual and semantic features. Based on CADA-VAE, Li et al (Li et al 2020) learns modalityinvariant latent representations by maximizing mutual information and entropy on latent space. DE-VAE (Ma and Hu 2020) adopts a deep embedding model to learn the mapping from the semantic space to the visual feature space.…”
Section: Cross-modal Alignment Models For Gzslmentioning
confidence: 99%
“…CADA-VAE (Schönfeld et al 2019) combines cross-modal and cross-reconstruction, and leverages two independent VAEs to align features via minimizing distance between the latent distributions of visual and semantic features. Based on CADA-VAE, Li et al (Li et al 2020) learns modalityinvariant latent representations by maximizing mutual information and entropy on latent space. DE-VAE (Ma and Hu 2020) adopts a deep embedding model to learn the mapping from the semantic space to the visual feature space.…”
Section: Cross-modal Alignment Models For Gzslmentioning
confidence: 99%
“…Some state-of-the-art GZSL approaches [18], [19], [20], [21], [22], [23] use generative models, such as generative adversarial nets (GANs) [24], variational autoencoders (VAEs) [25], or different variants of hybrid GAN/VAEs, to find an alignment between class-level semantic descriptions and visual representations. As such, the zero-shot learning problem becomes one of a traditional supervised classification task.…”
Section: Unseen Classesmentioning
confidence: 99%
“…Imrattanatra et al [16] propose a embedding model based on a knowledge graph. Such methods resort to learning a projection from visual space to semantic space [16,17,21,38,53] or the reverse [36,58]. Then, ZSL can be accomplished by ranking similarity or compatibility in the shared space.…”
Section: Related Work 41 Zero-shot Learningmentioning
confidence: 99%