Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval 2019
DOI: 10.1145/3331184.3331275
|View full text |Cite
|
Sign up to set email alerts
|

Deep Collaborative Discrete Hashing with Semantic-Invariant Structure

Abstract: Existing deep hashing approaches fail to fully explore semantic correlations and neglect the effect of linguistic context on visual attention learning, leading to inferior performance. This paper proposes a dual-stream learning framework, dubbed Deep Collaborative Discrete Hashing (DCDH), which constructs a discriminative common discrete space by collaboratively incorporating the shared and individual semantics deduced from visual features and semantic labels. Specifically, the context-aware representations ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 14 publications
(4 citation statements)
references
References 17 publications
0
4
0
Order By: Relevance
“…With the advent of multimedia streaming [3,23,40,41] and gaming data, automatically recognizing and understanding human actions and events in videos have become increasingly important, especially for practical tasks such as video retrieval [17], surveillance [28], and recommendation [42,43]. Over the past decades, great efforts have been made to boost the recognition performance with deep learning for different purposes including appearances and short-term motions learning [33,36], temporal structure modeling [39], and human skeleton and pose embedding [19,31,45].…”
Section: Introductionmentioning
confidence: 99%
“…With the advent of multimedia streaming [3,23,40,41] and gaming data, automatically recognizing and understanding human actions and events in videos have become increasingly important, especially for practical tasks such as video retrieval [17], surveillance [28], and recommendation [42,43]. Over the past decades, great efforts have been made to boost the recognition performance with deep learning for different purposes including appearances and short-term motions learning [33,36], temporal structure modeling [39], and human skeleton and pose embedding [19,31,45].…”
Section: Introductionmentioning
confidence: 99%
“…(iii) DSH-Supervised is unsuitable for retrieval across a large number of categories due to the incident imbalanced input of positive and negative pairs [46]. We have also tried another very recently published pairwise similarity-preserving hashing model Deep Collaborative Discrete Hashing (DCDH) [47] as our baseline, however its performance equals to chance-performance, so that is not reported in Table II. This shows the importance of metric selection under universal (hundreds of categories) millionscale sketch hashing retrieval, where softmax cross entropy loss generally works better, while pairwise contrastive loss hardly constrains the feature representation space and word vector can be misleading, i.e., basketball and apple are similar in terms of shape abstraction, but pushing further away under semantic distance.…”
Section: Unsupervisedmentioning
confidence: 99%
“…Learning to hash has arisen to be a promising choice because of its fast retrieval speed and low storage consumption (Li et al 2020;Chen et al 2021b;Weng and Zhu 2021;Liu et al 2019b). Roughly speaking, we could divide existing methods into uni-modal hashing (Shi et al 2022;Wang et al 2018a;Liu et al 2019a;Wang et al 2019;Luo et al 2019), cross-modal hashing (Liu et al 2019c;Xie et al 2020;Jin, Li, and Tang 2020;Nie et al 2020;Hu et al 2021), and multi-modal hashing (Liu et al 2012;Shen et al 2015;Zhu et al 2020a). Thereinto, multi-modal hashing requires that both database and query samples provide heterogeneous multi-modal features.…”
Section: Introductionmentioning
confidence: 99%