2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP) 2019
DOI: 10.1109/globalsip45357.2019.8969272
|View full text |Cite
|
Sign up to set email alerts
|

Learning Product Codebooks Using Vector-Quantized Autoencoders for Image Retrieval

Abstract: Vector-Quantized Variational Autoencoders (VQ-VAE) [1] provide an unsupervised model for learning discrete representations by combining vector quantization and autoencoders. In this paper, we study the use of VQ-VAE for representation learning for downstream tasks, such as image retrieval. We first describe the VQ-VAE in the context of an informationtheoretic framework. We show that the regularization term on the learned representation is determined by the size of the embedded codebook before the training and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…A fine trained model can encode useful information from images into latent variable and will have a non-zero KL divergence term and a relatively small reconstruction term. However, the straightforward training of VAE can suffer from posterior collapse [ 45 , 50 ], and it fails to make use of enough information. When the posterior collapse phenomenon occurs, the model ends up relying solely on the auto-regressive properties of the decoder while ignoring the latent variables, which become uninformative.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…A fine trained model can encode useful information from images into latent variable and will have a non-zero KL divergence term and a relatively small reconstruction term. However, the straightforward training of VAE can suffer from posterior collapse [ 45 , 50 ], and it fails to make use of enough information. When the posterior collapse phenomenon occurs, the model ends up relying solely on the auto-regressive properties of the decoder while ignoring the latent variables, which become uninformative.…”
Section: Methodsmentioning
confidence: 99%
“…Unlike autoencoders (AEs), VAE does not encode the training data as an isolated vector, rather it can force the latent variable to fill the space [ 45 ]. Therefore, the input images can be encoded in latent variables via the encoder network, which is useful for image retrieval [ 46 ] and clustering [ 47 , 48 ] tasks.…”
Section: Related Workmentioning
confidence: 99%
“…It is noteworthy that the feature decomposing network has the latent space as its bottleneck, where no bypass connection between the encoder and decoders, such as skip connections (Drozdzal et al, 2016), is implemented. Therefore, we can expect that the information processed by the encoder can be compressed in latent spaces (Razavi et al, 2019;Wu and Flierl, 2019).…”
Section: Feature Decomposing Networkmentioning
confidence: 99%
“…The three techniques are not introduced by us and are already used by some existing work related to VQ but outside of the graph learning community. For example, product VQ is used in [45], and whitening is used in [46].…”
Section: Checklistmentioning
confidence: 99%