Deep Collaborative Discrete Hashing with Semantic-Invariant Structure

Wang, Zijian; Zhang, Zheng; Luo, Yadan; Huang, Zi

doi:10.1145/3331184.3331275

Cited by 14 publications

(4 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the advent of multimedia streaming [3,23,40,41] and gaming data, automatically recognizing and understanding human actions and events in videos have become increasingly important, especially for practical tasks such as video retrieval [17], surveillance [28], and recommendation [42,43]. Over the past decades, great efforts have been made to boost the recognition performance with deep learning for different purposes including appearances and short-term motions learning [33,36], temporal structure modeling [39], and human skeleton and pose embedding [19,31,45].…”

Section: Introductionmentioning

confidence: 99%

Adversarial Bipartite Graph Learning for Video Domain Adaptation

Luo

Huang

Wang

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area due to the significant spatial and temporal shifts across the source (i.e. training) and target (i.e. test) domains. As such, recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations and strengthen the feature transferability are not highly effective on the videos. To overcome this limitation, in this paper, we learn a domain-agnostic video classifier instead of learning domain-invariant representations, and propose an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions with a network topology of the bipartite graph. Specifically, the source and target frames are sampled as heterogeneous vertexes while the edges connecting two types of nodes measure the affinity among them. Through message-passing, each vertex aggregates the features from its heterogeneous neighbors, forcing the features coming from the same class to be mixed evenly. Explicitly exposing the video classifier to such cross-domain representations at the training and test stages makes our model less biased to the labeled source data, which in-turn results in achieving a better generalization on the target domain. The proposed framework is agnostic to the choices of frame aggregation, and therefore, four different aggregation functions are investigated for capturing appearance and temporal dynamics. To further enhance the model capacity and testify the robustness of the proposed architecture on difficult transfer tasks, we extend our model to work in a semi-supervised setting using an additional video-level bipartite graph. Extensive experiments conducted on four benchmark datasets evidence the effectiveness of the proposed approach over the state-of-the-art methods on the task of video recognition. CCS CONCEPTS• Computing methodologies → Transfer learning; Activity recognition and understanding.

show abstract

Section: Introductionmentioning

confidence: 99%

Adversarial Bipartite Graph Learning for Video Domain Adaptation

Luo

Huang

Wang

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

show abstract

“…(iii) DSH-Supervised is unsuitable for retrieval across a large number of categories due to the incident imbalanced input of positive and negative pairs [46]. We have also tried another very recently published pairwise similarity-preserving hashing model Deep Collaborative Discrete Hashing (DCDH) [47] as our baseline, however its performance equals to chance-performance, so that is not reported in Table II. This shows the importance of metric selection under universal (hundreds of categories) millionscale sketch hashing retrieval, where softmax cross entropy loss generally works better, while pairwise contrastive loss hardly constrains the feature representation space and word vector can be misleading, i.e., basketball and apple are similar in terms of shape abstraction, but pushing further away under semantic distance.…”

Section: Unsupervisedmentioning

confidence: 99%

On Learning Semantic Representations for Million-Scale Free-Hand Sketches

Xu,

Huang,

Yuan

et al. 2020

Preprint

View full text Add to dashboard Cite

In this paper, we study learning semantic representations for million-scale free-hand sketches. This is highly challenging due to the domain-unique traits of sketches, e.g., diverse, sparse, abstract, noisy. We propose a dual-branch CNN-RNN network architecture to represent sketches, which simultaneously encodes both the static and temporal patterns of sketch strokes. Based on this architecture, we further explore learning the sketch-oriented semantic representations in two challenging yet practical settings, i.e., hashing retrieval and zero-shot recognition on million-scale sketches. Specifically, we use our dualbranch architecture as a universal representation framework to design two sketch-specific deep models: (i) We propose a deep hashing model for sketch retrieval, where a novel hashing loss is specifically designed to accommodate both the abstract and messy traits of sketches. (ii) We propose a deep embedding model for sketch zero-shot recognition, via collecting a largescale edge-map dataset and proposing to extract a set of semantic vectors from edge-maps as the semantic knowledge for sketch zero-shot domain alignment. Both deep models are evaluated by comprehensive experiments on million-scale sketches and outperform the state-of-the-art competitors.

show abstract

“…Learning to hash has arisen to be a promising choice because of its fast retrieval speed and low storage consumption (Li et al 2020;Chen et al 2021b;Weng and Zhu 2021;Liu et al 2019b). Roughly speaking, we could divide existing methods into uni-modal hashing (Shi et al 2022;Wang et al 2018a;Liu et al 2019a;Wang et al 2019;Luo et al 2019), cross-modal hashing (Liu et al 2019c;Xie et al 2020;Jin, Li, and Tang 2020;Nie et al 2020;Hu et al 2021), and multi-modal hashing (Liu et al 2012;Shen et al 2015;Zhu et al 2020a). Thereinto, multi-modal hashing requires that both database and query samples provide heterogeneous multi-modal features.…”

Section: Introductionmentioning

confidence: 99%

Online Enhanced Semantic Hashing: Towards Effective and Efficient Retrieval for Streaming Multi-Modal Data

Luo

Zhan

et al. 2022

AAAI

View full text Add to dashboard Cite

With the vigorous development of multimedia equipments and applications, efficient retrieval of large-scale multi-modal data has become a trendy research topic. Thereinto, hashing has become a prevalent choice due to its retrieval efficiency and low storage cost. Although multi-modal hashing has drawn lots of attention in recent years, there still remain some problems. The first point is that existing methods are mainly designed in batch mode and not able to efficiently handle streaming multi-modal data. The second point is that all existing online multi-modal hashing methods fail to effectively handle unseen new classes which come continuously with streaming data chunks. In this paper, we propose a new model, termed Online enhAnced SemantIc haShing (OASIS). We design novel semantic-enhanced representation for data, which could help handle the new coming classes, and thereby construct the enhanced semantic objective function. An efficient and effective discrete online optimization algorithm is further proposed for OASIS. Extensive experiments show that our method can exceed the state-of-the-art models. For good reproducibility and benefiting the community, our code and data are already publicly available.

show abstract

Deep Collaborative Discrete Hashing with Semantic-Invariant Structure

Cited by 14 publications

References 17 publications

Adversarial Bipartite Graph Learning for Video Domain Adaptation

Adversarial Bipartite Graph Learning for Video Domain Adaptation

On Learning Semantic Representations for Million-Scale Free-Hand Sketches

Online Enhanced Semantic Hashing: Towards Effective and Efficient Retrieval for Streaming Multi-Modal Data

Contact Info

Product

Resources

About