Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021
DOI: 10.1145/3404835.3462974
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 31 publications
(9 citation statements)
references
References 37 publications
0
9
0
Order By: Relevance
“…Recently, HGR (Chen et al 2020) is proposed to divide the sentence into three parts: events, action, and entity, with the hypothesis that all the captions own a relatively fixed hierarchical graph form. HCGC (Jin et al 2021) adopts the same sentence resolving strategy and introduces the hierar-chical cross-modal graph consistency learning into embedding space.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, HGR (Chen et al 2020) is proposed to divide the sentence into three parts: events, action, and entity, with the hypothesis that all the captions own a relatively fixed hierarchical graph form. HCGC (Jin et al 2021) adopts the same sentence resolving strategy and introduces the hierar-chical cross-modal graph consistency learning into embedding space.…”
Section: Related Workmentioning
confidence: 99%
“…The heterogeneity of structures. This mainly lies in the impossibility of directly aligning the words in sentences with corresponding video frames (Jin et al 2021). Singlestream or two-stream structures are applied to treat text and video as two independent parts for early or late fusion, which ignore the internal relevancy between frames and words, resulting in that the models require massive data to reach decent performance.…”
Section: Introduction Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…Inspired by the investigation of graph consistency in [16,55], to explore the agreement property of each item based on the augmented views, we define knowledge graph structure consistency 𝑐 𝑖 of item 𝑖 with the agreement between the representations encoded from different views as follows:…”
Section: Agreement Between Augmented Structural Viewsmentioning
confidence: 99%
“…DNNs are known to have enticing representation capability and have the natural strength to capture comprehensive relations [76] over different entities (e.g., items, users, interactions). Recently, there are works that explore advanced techniques, e.g., memory networks [53], attention mechanisms [56,79], and graph neural networks [9,26,31,36,81] for sequential recommendation [6,23,29,54,61,67,72]. Typically, MIND [32] adopts the dynamic routing mechanism to aggregate users' behaviors into multiple interest vectors.…”
Section: Related Work 21 Sequential Recommendationmentioning
confidence: 99%