Proceedings of the 27th ACM International Conference on Multimedia 2019
DOI: 10.1145/3343031.3351030
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints

Abstract: Cross-modal embeddings, between textual and visual modalities, aim to organise multimodal instances by their semantic correlations. State-of-the-art approaches use maximum-margin methods, based on the hinge-loss, to enforce a constant margin m, to separate projections of multimodal instances from different categories. In this paper, we propose a novel scheduled adaptive maximum-margin (SAM) formulation that infers triplet-specific constraints during training, therefore organising instances by adaptively enforc… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…[39] augment the bidirectional contrastive loss by also summing the margin to the loss objective, to optimize it during the training process. For textimage retrieval, [49] propose a scheduled adaptive margin which starts from a fixed value and gradually changes during the training process both to integrate inter-category similarity-based correlations and to preserve the category clusters formed during the initial phases of the training. Recently, for cross-modal video retrieval [25] proposed an adaptive margin proportional to the similarity of the representations computed for the negative pair, both in terms of 'static' (pretrained, frozen) models, which provide initial supervision, and 'dynamic' (trained with the task) models, which provide supervision in later stages of the training.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…[39] augment the bidirectional contrastive loss by also summing the margin to the loss objective, to optimize it during the training process. For textimage retrieval, [49] propose a scheduled adaptive margin which starts from a fixed value and gradually changes during the training process both to integrate inter-category similarity-based correlations and to preserve the category clusters formed during the initial phases of the training. Recently, for cross-modal video retrieval [25] proposed an adaptive margin proportional to the similarity of the representations computed for the negative pair, both in terms of 'static' (pretrained, frozen) models, which provide initial supervision, and 'dynamic' (trained with the task) models, which provide supervision in later stages of the training.…”
Section: Related Workmentioning
confidence: 99%
“…[11]) or adaptive solutions. In particular, [49] implemented a schedule for the margin value which gradually incorporates inter-category correlations and information about the structure of the embedding space. Recently, for video retrieval [25] proposed an adaptive margin proportional to the similarity of item and query as computed by multiple models.…”
Section: Introductionmentioning
confidence: 99%
“…Conventional triplet loss utilizes a fixed margin to push positive and negative pairs apart, which indicates it treats different training samples equally. Recently, many scholars have improved the effect by changing the fixed margin to an adaptive margin [3,16,22,30,39,42,42].…”
Section: Related Workmentioning
confidence: 99%
“…Hu et al [16] introduced a new weighted adaptive margin ranking loss, speeding up the training convergence and improving image retrieval accuracy. While most adaptive margins were proposed in the uni-modal domain, Semedo et al [30] proposed a pair-specific margin for image-text retrieval by category cluster and preservation. Note that this method was based on the semantic category of the image, which did not exist in video retrieval.…”
Section: Related Workmentioning
confidence: 99%