It Takes Two to Tango: Mixup for Deep Metric Learning

Venkataramanan, S.; Psomas, Bill; Kijak, Ewa; Amsaleg, Laurent; Καράντζαλος, Κωνσταντίνος; Avrithis, Yannis

doi:10.48550/arxiv.2106.04990

Cited by 5 publications

(10 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Consequently, we introduce metric-learning regularization terms in the original problem (equation ( 6)), which we call FP-Metric. Metric-learning (Hoffer and Ailon 2015; Kaya and Bilge 2019) is well-known approach to learn appropriate representation via FP and positive samples in computer vision (Karpusha, Yun, and Fehervari 2020;Venkataramanan et al 2021) and audio (Chung et al 2020;Xu et al 2020) domains. Also, metric-learning is getting great attention recently due to high-performance in self-supervised and unsupervised approaches (Jaiswal et al 2021).…”

Section: Theoretical Analysismentioning

confidence: 99%

FPAdaMetric: False-Positive-Aware Adaptive Metric Learning for Session-Based Recommendation

Jeong¹,

Choi²,

Cho

et al. 2022

AAAI

View full text Add to dashboard Cite

Modern recommendation systems are mostly based on implicit feedback data which can be quite noisy due to false positives (FPs) caused by many reasons, such as misclicks or quick curiosity. Numerous recommendation algorithms based on collaborative filtering have leveraged post-click user behavior (e.g., skip) to identify false positives. They effectively involved these false positives in the model supervision as negative-like signals. Yet, false positives had not been considered in existing session-based recommendation systems (SBRs) although they provide just as deleterious effects. To resolve false positives in SBRs, we first introduce FP-Metric model which reformulates the objective of the session-based recommendation with FP constraints into metric learning regularization. In addition, we propose FP-AdaMetric that enhances the metric-learning regularization terms with an adaptive module that elaborately calculates the impact of FPs inside sequential patterns. We verify that FP-AdaMetric improves several session-based recommendation models' performances in terms of Hit Rate (HR), MRR, and NDCG on datasets from different domains including music, movie, and game. Furthermore, we show that the adaptive module plays a much more crucial role in FP-AdaMetric model than in other baselines.

show abstract

Section: Theoretical Analysismentioning

confidence: 99%

FPAdaMetric: False-Positive-Aware Adaptive Metric Learning for Session-Based Recommendation

Jeong¹,

Choi²,

Cho

et al. 2022

AAAI

View full text Add to dashboard Cite

show abstract

“…Such a large batch is used for the first time in several of the employed image retrieval benchmarks. The batch size is virtually increased with another contribution, which is a computationally efficient mixup technique, called SiMix, that operates on the similarity scores instead of the embedding vectors as in prior work [55,13,12]. If the training set is not large enough and all of its classes are used to form a single batch, SiMix is shown essential to virtually increase the batch size and significantly boost the performance.…”

Section: Query Ranked Database Imagesmentioning

confidence: 99%

“…Linearly interpolating labels entails the risk of generating false negatives if the interpolation factor is close to 0 or 1. Such limitations are overcome in the work of Venkataramanan et al [55], which generalizes mixing examples from different classes for pairwise loss functions. The proposed SiMix approach differs from the aforementioned techniques as it operates on the similarity scores instead of the embedding vectors, making it computationally efficient.…”

Section: Related Workmentioning

confidence: 99%

“…Therefore, the minibatch is expanded to B∪ B by adding virtual examples without the need for explicit construction of the corresponding embeddings or estimation of the similarity via dot product; simple mixing of the corresponding pairwise scalar similarities is enough. SiMix reduces to mixing pairwise similarities due to the lack of re-normalization of the mixed embeddings, which is different to existing practice in prior work[55,12,13,21] and brings training efficiency benefits. Virtual examples are created only between examples of the same classes and are labeled according to the class of the original examples that are mixed.…”

mentioning

confidence: 96%

See 1 more Smart Citation

Recall@k Surrogate Loss with Large Batches and Similarity Mixup

Patel¹,

Tolias²,

Matas³

2021

Preprint

View full text Add to dashboard Cite

Direct optimization, by gradient descent, of an evaluation metric, is not possible when it is non-differentiable, which is the case for recall in retrieval. In this work, a differentiable surrogate loss for the recall is proposed. Using an implementation that sidesteps the hardware constraints of the GPU memory, the method trains with a very large batch size, which is essential for metrics computed on the entire retrieval database. It is assisted by an efficient mixup approach that operates on pairwise scalar similarities and virtually increases the batch size further. When used for deep metric learning, the proposed method achieves stateof-the-art results in several image retrieval benchmarks. For instance-level recognition, the method outperforms similar approaches that train using an approximation of average precision. The implementation will be made public.

show abstract

“…Note that architectures, optimizers, and pooling layer differ slightly between the different implementations. For SOP and Landmarks, we use ResNet-18, GeM pooling, and SGD, [38]-R-GeM and [40] use ResNet-101, GeM pooling, and Adam, [33] uses BN-Inception, a linear projection, and RMSProp, [34] use Google LeNet, a linear projection, and SGD, and [47] use a Resnet50, a combination of average and max pooling followed by a linear projection, and AdamW. For TinyImageNet, we use ResNet-32 and SGD, and [36] uses ResNet-32 and Adam.…”

Section: D1 Switching the Task And The Lossmentioning

confidence: 99%

Simple and Effective Balance of Contrastive Losses

Sors¹,

Rezende²,

Ibrahimi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Contrastive losses have long been a key ingredient of deep metric learning and are now becoming more popular due to the success of self-supervised learning. Recent research has shown the benefit of decomposing such losses into two sub-losses which act in a complementary way when learning the representation network: a positive term and an entropy term. Although the overall loss is thus defined as a combination of two terms, the balance of these two terms is often hidden behind implementation details and is largely ignored and sub-optimal in practice. In this work, we approach the balance of contrastive losses as a hyperparameter optimization problem, and propose a coordinate descent-based search method that efficiently find the hyperparameters that optimize evaluation performance. In the process, we extend existing balance analyses to the contrastive margin loss, include batch size in the balance, and explain how to aggregate loss elements from the batch to maintain near-optimal performance over a larger range of batch sizes. Extensive experiments with benchmarks from deep metric learning and self-supervised learning show that optimal hyper-parameters are found faster with our method than with other common search methods.* Work partially done while Sarah was interning at NAVER LABS Europe.

show abstract

It Takes Two to Tango: Mixup for Deep Metric Learning

Cited by 5 publications

References 40 publications

FPAdaMetric: False-Positive-Aware Adaptive Metric Learning for Session-Based Recommendation

FPAdaMetric: False-Positive-Aware Adaptive Metric Learning for Session-Based Recommendation

Recall@k Surrogate Loss with Large Batches and Similarity Mixup

Simple and Effective Balance of Contrastive Losses

Contact Info

Product

Resources

About