2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.104
|View full text |Cite
|
Sign up to set email alerts
|

Deep Visual-Semantic Quantization for Efficient Image Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
171
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
3
3

Relationship

2
7

Authors

Journals

citations
Cited by 155 publications
(171 citation statements)
references
References 22 publications
0
171
0
Order By: Relevance
“…Figure 5 shows the t-SNE visualizations [31] of the deep representations learned by DVSQ [1], DTQ-2, and DTQ on CIFAR-10 dataset. The deep representations of the proposed DTQ exhibit clear discriminative structures with data points in different categories well separated, while the deep representations by DVSQ [1] exhibit relative vague structures. This validates that by introducing the triplet training to deep quantization, the deep representations generated by our DTQ are more discriminative than that generated by DVSQ, enabling more accurate image retrieval.…”
Section: Tripletmentioning
confidence: 99%
“…Figure 5 shows the t-SNE visualizations [31] of the deep representations learned by DVSQ [1], DTQ-2, and DTQ on CIFAR-10 dataset. The deep representations of the proposed DTQ exhibit clear discriminative structures with data points in different categories well separated, while the deep representations by DVSQ [1] exhibit relative vague structures. This validates that by introducing the triplet training to deep quantization, the deep representations generated by our DTQ are more discriminative than that generated by DVSQ, enabling more accurate image retrieval.…”
Section: Tripletmentioning
confidence: 99%
“…Supervised methods, on the contrary, can well explore the semantic labels for enhancing the cross-modal correlations and reducing the semantic gap [19], hence they generally outperform unsupervised methods for cross-modal search. The latest cross-modal retrieval models via deep learning [21,28,6,24,4] have shown that deep models can distill complex cross-modal correlations more effectively. Despite of the success of deep models for cross-modal search, existing cross-modal retrieval methods are mainly unsupervised and not tailored to hash function learning.…”
Section: Introductionmentioning
confidence: 99%
“…More specifically: (1) We explore the feature correlation by reconstructing the feature vectors of one modality from the corresponding hash codes of another modality, and capture the cross-modal correlations revealed by the feature vectors; (2) We explore the semantic correlation by maximizing the inter-category separation margin and minimizing the intra-category variance, which will produce more discriminative and semantically consistent hash codes; (3) Since cross-modal data (e.g. image and text) are heterogeneous and are difficult to correlate by linear or shallow models [28,6,4], we enhance both cross-modal correlations in a deep architecture, which will make the embedded hash codes generalize better across different modalities. Comprehensive results on large-scale benchmarks show that CAH significantly outperforms state-of-the-art cross-modal hashing methods.…”
Section: Introductionmentioning
confidence: 99%
“…Compared with unsupervised methods [4,19], supervised methods [5,14,17,18,20] can yield better performance with the support of label supervision. With the rapid development of deep neural network, deep hashing methods [1,2,7,11,12,16,21] have demonstrated superior performance over non-deep hashing methods and achieved state-of-the-art results on public benchmarks. However, among mainstream deep hashing frameworks, humanannotated labels purely supervise the distribution alignment of hash code embedding, yet fail to trigger context-aware visual representation learning, let alone optimal binary codes generation.…”
Section: Introductionmentioning
confidence: 99%