Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization

Ramé, Alexandre; Dancette, Corentin; Cord, Matthieu

doi:10.48550/arxiv.2109.02934

Cited by 8 publications

(13 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, incremental learning requires old data from memory storage while our prompt-based learning method has no access to the pre-trained data. Another related field that leverages gradient matching to transfer knowledge is domain generalization [44,37] and multi-task learning [43,52]. However, their methods are not directly applicable in prompt tuning whose transfer direction is only from general to downstream.…”

Section: Related Workmentioning

confidence: 99%

“…* indicates results copied from[7]. ± 1.2443.74 ± 1 37. 53.33 ± 1.57 61.26 ± 1.45 65.00 ± 2.87 CoOp 59.44 ± 1.88 62.31 ± 1.40 66.72 ± 0.93 70.06 ± 0.53 73.48 ± 0.39 ProGrad 32.29 ± 1.12 46.14 ± 1.49 55.18 ± 1.99 62.05 ± 0.93 66.47 ± 1.69 CoOp + l2 prompt reg 60.84 ± 1.16 62.75 ± 1.18 66.85 ± 0.76 70.08 ± 0.58 72.92 ± 0.46 CoOp + GM 61.27 ± 0.96 63.23 ± 0.50 64.59 ± 0.63 66.40 ± 0.49 67.12 ± 0.29 CoOp + KD 61.52 ± 0.99 64.07 ± 0.52 66.52 ± 0.38 70.01 ± 0.31 72.01 ± 0.37 CoOp + ProGrad 62.61 ± 0.80 64.90 ± 0.86 68.45 ± 0.52 71.41 ± 0.49 73.95 ± 0.42 ImageNet Cosine 15.95 ± 0.07 26.56 ± 0.30 37.08 ± 0.29 46.18 ± 0.19 53.36 ± 0.39 CoOp 57.15 ± 1.03 57.25 ± 0.43 59.51 ± 0.25 61.59 ± 0.17 63.00 ± 0.18 ProGrad 19.21 ± 0.28 31.18 ± 0.18 42.59 ± 0.29 51.73 ± 0.18 57.65 ± 0.33 CoOp + l2 prompt reg 57.51 ± 0.22 61.27 ± 0.49 62.49 ± 0.12 62.71 ± 0.01 62.88 ± 0.09 CoOp + GM 60.41 ± 0.17 60.51 ± 0.13 60.75 ± 0.06 61.01 ± 0.14 61.44 ± 0.03 CoOp + KD 60.85 ± 0.22 61.08 ± 0.10 61.51 ± 0.07 61.67 ± 0.12 62.05 ± 0.09 CoOp + ProGrad 57.75 ± 0.24 59.75 ± 0.33 61.46 ± 0.07 62.54 ± 0.03 63.45 ± 0.08 Caltech101 Cosine 60.76 ± 1.71 73.10 ± 1.01 81.43 ± 0.65 87.02 ± 0.60 90.60 ± 0.05 CoOp 87.40 ± 0.98 87.92 ± 1.12 89.48 ± 0.47 90.25 ± 0.18 92.00 ± 0.02 ProGrad 61.95 ± 0.12 75.24 ± 0.88 82.98 ± 0.38 88.59 ± 0.21 91.31 ± 0.19 CoOp + l2 prompt reg 87.04 ± 0.61 87.37 ± 0.78 88.82 ± 0.40 89.62 ± 0.29 91.67 ± 0.26 CoOp + GM 89.14 ± 0.15 89.37 ± 0.26 89.64 ± 0.33 89.36 ± 0.31 89.42 ± 0.13 CoOp + KD 89.06 ± 0.29 89.71 ± 0.20 90.13 ± 0.16 90.09 ± 0.30 91.39 ± 0.05 CoOp + ProGrad 88.68 ± 0.34 87.98 ± 0.69 89.99 ± 0.26 90.83 ± 0.07 92.10 ± 0.39 OxfordPets Cosine 26.33 ± 0.75 41.60 ± 1.93 55.29 ± 1.97 66.60 ± 0.82 66.84 ± 16.24 CoOp 86.01 ± 0.47 82.21 ± 2.12 86.63 ± 1.02 85.15 ± 1.12 87.06 ± 0.88 ProGrad 26.08 ± 0.73 40.58 ± 2.01 55.23 ± 1.44 66.78 ± 1.58 68.96 ± 14.35 CoOp + l2 prompt reg 87.55 ± 0.15 82.12 ± 2.61 84.93 ± 1.77 84.38 ± 0.75 86.28 ± 0.45 CoOp + GM 87.05 ± 0.65 87.06 ± 0.67 88.45 ± 0.45 88.35 ± 0.15 88.38 ± 0.27 CoOp + KD 87.10 ± 1.47 87.40 ± 0.60 88.56 ± 0.19 88.77 ± 0.24 89.16 ± 0.16 CoOp + ProGrad 88.36 ± 0.73 86.89 ± 0.42 88.04 ± 0.50 87.91 ± 0.54 89.00 ± 0.32 StanfordCars Cosine 18.96 ± 0.34 33.37 ± 0.38 47.75 ± 0.38 61.30 ± 0.25 71.94 ± 0.31 CoOp 55.68 ± 1.23 58.33 ± 0.60 63.05 ± 0.09 68.37 ± 0.25 73.34 ± 0.49 ProGrad 21.13 ± 0.50 39.44 ± 0.83 54.54 ± 0.57 66.47 ± 0.14 73.41 ± 0.11 CoOp + l2 prompt reg 55.86 ± 0.66 57.69 ± 0.51 62.82 ± 0.07 66.63 ± 0.25 69.86 ± 0.44 CoOp + GM 57.37 ± 0.36 58.46 ± 0.24 59.72 ± 0.66 62.32 ± 0.59 63.87 ± 0.37 CoOp + KD 57.48 ± 1.47 59.09 ± 0.60 61.47 ± 0.19 67.73 ± 0.24 70.48 ± 0.16 CoOp + ProGrad 58.38 ± 0.23 61.81 ± 0.45 65.62 ± 0.43 69.29 ± 0.11 73.46 ± 0.29 Flowers102 Cosine 51.33 ± 2.77 70.06 ± 2.29 82.43 ± 1.65 91.74 ± 0.73 95.68 ± 0.22 CoOp 68.13 ± 1.74 76.68 ± 1.82 86.13 ± 0.75 91.74 ± 0.49 94.72 ± 0.34 ± 2.31 70.13 ± 1.90 81.09 ± 2.06 91.62 ± 0.41 93.94 ± 0.02 CoOp + l2 prompt reg 71.12 ± 0.55 80.36 ± 0.54 86.42 ± 0.33 91.58 ± 0.59 94.25 ± 0.38 CoOp + GM 67.87 ± 0.31 69.09 ± 0.49 71.69 ± 0.68 75.76 ± 0.79 78.36 ± 0.34 CoOp + KD 68.11 ± 1.47 71.02 ± 0.60 76.06 ± 0.19 84.53 ± 0.24 88.05 ± 0.16 CoOp + ProGrad 73.18 ± 0.73 79.77 ± 0.65 85.37 ± 0.96 91.64 ± 0.24 94.37 ± 0.24…”

mentioning

confidence: 99%

See 1 more Smart Citation

Prompt-aligned Gradient for Prompt Tuning

Zhu¹,

Niu²,

Han³

et al. 2022

Preprint

View full text Add to dashboard Cite

Thanks to the large pre-trained vision-language models (VLMs) like CLIP [36], we can craft a zero-shot classifier by "prompt", e.g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity measure between the image and the prompt sentence "a photo of a [CLASS]". Therefore, prompt shows a great potential for fast adaptation of VLMs to downstream tasks if we fine-tune the prompt-based similarity measure. However, we find a common failure that improper fine-tuning may not only undermine the prompt's inherent prediction for the task-related classes, but also for other classes in the VLM vocabulary. Existing methods still address this problem by using traditional anti-overfitting techniques such as early stopping and data augmentation, which lack a principled solution specific to prompt. We present Prompt-aligned Gradient, dubbed ProGrad, to prevent prompt tuning from forgetting the the general knowledge learned from VLMs. In particular, ProGrad only updates the prompt whose gradient is aligned (or non-conflicting) to the "general direction", which is represented as the gradient of the KL loss of the pre-defined prompt prediction. Extensive experiments demonstrate the stronger few-shot generalization ability of ProGrad over state-of-the-art prompt tuning methods. Codes are available at https://github.com/BeierZhu/Prompt-align.

show abstract

Section: Related Workmentioning

confidence: 99%

mentioning

confidence: 99%

Prompt-aligned Gradient for Prompt Tuning

Zhu¹,

Niu²,

Han³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…generalization for centralized training, second-order information is being found quite useful as a theoretical indicator. Recent works [19,21] find that forming representations that are "hard to vary" seem to result in better O.O.D. performance.…”

Section: Algorithm Analysis Based On Second-order Informationmentioning

confidence: 99%

“…Specifically, we carefully analyze the effectiveness of various data and structural regularization methods at reducing client drift and improving FL performance (Section 3). Utilizing secondorder information and insights from out-of-distribution generality literature [19,21], we identify theoretical indicators for successful FL optimization, and evaluate across a variety of FL settings for empirical validation.…”

Section: Introductionmentioning

confidence: 99%

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning

Mendieta¹,

Yang²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices). However, the data distribution among clients is often non-IID in nature, making efficient optimization difficult. To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model. Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. To this end, we first present a systematic study informed by second-order indicators to better understand algorithm effectiveness in FL. Interestingly, we find that standard regularization methods are surprisingly strong performers in mitigating data heterogeneity effects. Based on our findings, we further propose a simple and effective method, FedAlign, to overcome data heterogeneity and the pitfalls of previous methods. FedAlign achieves competitive accuracy with state-ofthe-art FL methods across a variety of settings while minimizing computation and memory overhead. Code will be publicly available.

show abstract

“…GroupDRO [36] minimizes the worst-case risk across training domains. Recently, methods based on gradient matching have been proposed [31,37]. Finally, a few works introduce benchmarks for evaluating domain generalization methods [14,18].…”

Section: Related Workmentioning

confidence: 99%

Failure Modes of Domain Generalization Algorithms

Galstyan¹,

Harutyunyan²,

Khachatrian³

et al. 2021

Preprint

View full text Add to dashboard Cite

Domain generalization algorithms use training data from multiple domains to learn models that generalize well to unseen domains. While recently proposed benchmarks demonstrate that most of the existing algorithms do not outperform simple baselines, the established evaluation methods fail to expose the impact of various factors that contribute to the poor performance. In this paper we propose an evaluation framework for domain generalization algorithms that allows decomposition of the error into components capturing distinct aspects of generalization. Inspired by the prevalence of algorithms based on the idea of domain-invariant representation learning, we extend the evaluation framework to capture various types of failures in achieving invariance. We show that the largest contributor to the generalization error varies across methods, datasets, regularization strengths and even training lengths. We observe two problems associated with the strategy of learning domain-invariant representations. On Colored MNIST, most domain generalization algorithms fail because they reach domain-invariance only on the training domains. On Camelyon-17, domain-invariance degrades the quality of representations on unseen domains. We hypothesize that focusing instead on tuning the classifier on top of a rich representation can be a promising direction.

show abstract

Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization

Cited by 8 publications

References 28 publications

Prompt-aligned Gradient for Prompt Tuning

Prompt-aligned Gradient for Prompt Tuning

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning

Failure Modes of Domain Generalization Algorithms

Contact Info

Product

Resources

About