2018
DOI: 10.1109/tpami.2017.2762295
|View full text |Cite
|
Sign up to set email alerts
|

Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation

Abstract: Abstract-Sufficient training examples are the fundamental requirement for most of the learning tasks. However, collecting welllabelled training examples is costly. Inspired by Zero-shot Learning (ZSL) that can make use of visual attributes or natural language semantics as an intermediate level clue to associate low-level features with high-level classes, in a novel extension of this idea, we aim to synthesise training data for novel classes using only semantic attributes. Despite the simplicity of this idea, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
66
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 103 publications
(67 citation statements)
references
References 56 publications
1
66
0
Order By: Relevance
“…This network is composed of a generative model G : A × Z → X (parameterized by θ G ) that produces a visual representation x given its semantic feature a and a noise vector z ∼ N (0, I) sampled from a multidimensional centered Gaussian, and a discriminative model D : X × A → [0, 1] (parameterized by θ D ) that tries to distinguish whether the input x and its semantic representation a represent a true or generated visual representation and respective semantic feature. Note that while the method developed by Yan et al [28] concerns the generation of realistic images, our proposed approach, similarly to [1,8,9], aims to generate visual representations, such as the features from a deep residual network [26] -the strategy based on visual representation has shown to produce more accurate GZSL classification results compared to the use of realistic images. The training algorithm for estimating θ G and θ D follows a minimax game, where G(.)…”
Section: F-clswganmentioning
confidence: 99%
“…This network is composed of a generative model G : A × Z → X (parameterized by θ G ) that produces a visual representation x given its semantic feature a and a noise vector z ∼ N (0, I) sampled from a multidimensional centered Gaussian, and a discriminative model D : X × A → [0, 1] (parameterized by θ D ) that tries to distinguish whether the input x and its semantic representation a represent a true or generated visual representation and respective semantic feature. Note that while the method developed by Yan et al [28] concerns the generation of realistic images, our proposed approach, similarly to [1,8,9], aims to generate visual representations, such as the features from a deep residual network [26] -the strategy based on visual representation has shown to produce more accurate GZSL classification results compared to the use of realistic images. The training algorithm for estimating θ G and θ D follows a minimax game, where G(.)…”
Section: F-clswganmentioning
confidence: 99%
“…Our ZSL method also produces comparable results to some of the supervised counterparts. Another interesting direction to note is that, while high accuracies can potentially be obtained using the recently proposed data generation models [23,24,25,26], these works are orthogonal to proposed method, and, in principle, these techniques can be used in combination with the ZSL model proposed in this work. We plan to investigate this line of research in future work.…”
Section: Resultsmentioning
confidence: 98%
“…Examples include those based on domain adaptation [19], semantic class prototype graph [20], unbiased embedding space [21], class prototype rectification [8] or knowledge graphs [22]. Recently, a few approaches have been proposed to use generative models for zero-shot learning [23,24,25,26].…”
Section: Visual Features Visually Meaningful Word Representationmentioning
confidence: 99%
“…These approaches learn the transformations from semantic space to the visual space and perform image recognition in the visual space, which can effectively tackle the hubness problem in ZSL [8,29]. [5,21,40] predict the visual samplers by learning embedding functions from the semantic space to the visual space. [25] adds some regularizers to learn the embedding function from class semantic to corresponding visual classifiers and [35] utilizes knowledge graphs to learn the same embedding functions.…”
Section: Visual-semantic Transformationsmentioning
confidence: 99%