Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints

Semedo, David; Magalhães, João

doi:10.1145/3343031.3351030

Cited by 10 publications

(4 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[39] augment the bidirectional contrastive loss by also summing the margin to the loss objective, to optimize it during the training process. For textimage retrieval, [49] propose a scheduled adaptive margin which starts from a fixed value and gradually changes during the training process both to integrate inter-category similarity-based correlations and to preserve the category clusters formed during the initial phases of the training. Recently, for cross-modal video retrieval [25] proposed an adaptive margin proportional to the similarity of the representations computed for the negative pair, both in terms of 'static' (pretrained, frozen) models, which provide initial supervision, and 'dynamic' (trained with the task) models, which provide supervision in later stages of the training.…”

Section: Related Workmentioning

confidence: 99%

“…[11]) or adaptive solutions. In particular, [49] implemented a schedule for the margin value which gradually incorporates inter-category correlations and information about the structure of the embedding space. Recently, for video retrieval [25] proposed an adaptive margin proportional to the similarity of item and query as computed by multiple models.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Relevance-based Margin for Contrastively-trained Video Retrieval Models

Falcon

Sudhakaran

Serra

et al. 2022

Proceedings of the 2022 International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

Video retrieval using natural language queries has attracted increasing interest due to its relevance in real-world applications, from intelligent access in private media galleries to web-scale video search. Learning the cross-similarity of video and text in a joint embedding space is the dominant approach. To do so, a contrastive loss is usually employed because it organizes the embedding space by putting similar items close and dissimilar items far. This framework leads to competitive recall rates, as they solely focus on the rank of the groundtruth items. Yet, assessing the quality of the ranking list is of utmost importance when considering intelligent retrieval systems, since multiple items may share similar semantics, hence a high relevance. Moreover, the aforementioned framework uses a fixed margin to separate similar and dissimilar items, treating all non-groundtruth items as equally irrelevant. In this paper we propose to use a variable margin: we argue that varying the margin used during training based on how much relevant an item is to a given query, i.e. a relevance-based margin, easily improves the quality of the ranking lists measured through nDCG and mAP. We demonstrate the advantages of our technique using different models on EPIC-Kitchens-100 and YouCook2. We show that even if we carefully tuned the fixed margin, our technique (which does not have the margin as a hyper-parameter) would still achieve better performance. Finally, extensive ablation studies and qualitative analysis support the robustness of our approach. Code will be released at https://github.com/aranciokov/RelevanceMargin-ICMR22.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Relevance-based Margin for Contrastively-trained Video Retrieval Models

Falcon

Sudhakaran

Serra

et al. 2022

Proceedings of the 2022 International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

show abstract

“…Conventional triplet loss utilizes a fixed margin to push positive and negative pairs apart, which indicates it treats different training samples equally. Recently, many scholars have improved the effect by changing the fixed margin to an adaptive margin [3,16,22,30,39,42,42].…”

Section: Related Workmentioning

confidence: 99%

“…Hu et al [16] introduced a new weighted adaptive margin ranking loss, speeding up the training convergence and improving image retrieval accuracy. While most adaptive margins were proposed in the uni-modal domain, Semedo et al [30] proposed a pair-specific margin for image-text retrieval by category cluster and preservation. Note that this method was based on the semantic category of the image, which did not exist in video retrieval.…”

Section: Related Workmentioning

confidence: 99%

Improving Video Retrieval by Adaptive Margin

Wang

Feng

et al. 2021

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Video retrieval is becoming increasingly important owing to the rapid emergence of videos on the Internet. The dominant paradigm for video retrieval learns video-text representations by pushing the distance between the similarity of positive pairs and that of negative pairs apart from a fixed margin. However, negative pairs used for training are sampled randomly, which indicates that the semantics between negative pairs may be related or even equivalent, while most methods still enforce dissimilar representations to decrease their similarity. This phenomenon leads to inaccurate supervision and poor performance in learning video-text representations.While most video retrieval methods overlook that phenomenon, we propose an adaptive margin changed with the distance between positive and negative pairs to solve the aforementioned issue. First, we design the calculation framework of the adaptive margin, including the method of distance measurement and the function between the distance and the margin. Then, we explore a novel implementation called "Cross-Modal Generalized Self-Distillation" (CMGSD), which can be built on the top of most video retrieval models with few modifications. Notably, CMGSD adds few computational overheads at train time and adds no computational overhead at test time. Experimental results on three widely used datasets demonstrate that the proposed method can yield significantly better performance than the corresponding backbone model, and it outperforms state-of-the-art methods by a large margin. CCS CONCEPTS• Information systems → Video search.

show abstract

Learning Semantic-Visual Embeddings with a Priority Queue

Valerio

Magalhães

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints

Cited by 10 publications

References 30 publications

Relevance-based Margin for Contrastively-trained Video Retrieval Models

Relevance-based Margin for Contrastively-trained Video Retrieval Models

Improving Video Retrieval by Adaptive Margin

Learning Semantic-Visual Embeddings with a Priority Queue

Contact Info

Product

Resources

About