2021
DOI: 10.48550/arxiv.2109.02934
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization

Abstract: Learning robust models that generalize well under changes in the data distribution is critical for real-world applications. To this end, there has been a growing surge of interest to learn simultaneously from multiple training domains -while enforcing different types of invariance across those domains. Yet, all existing approaches fail to show systematic benefits under fair evaluation protocols. In this paper, we propose a new learning scheme to enforce domain invariance in the space of the gradients of the lo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
13
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(13 citation statements)
references
References 28 publications
0
13
0
Order By: Relevance
“…In addition, incremental learning requires old data from memory storage while our prompt-based learning method has no access to the pre-trained data. Another related field that leverages gradient matching to transfer knowledge is domain generalization [44,37] and multi-task learning [43,52]. However, their methods are not directly applicable in prompt tuning whose transfer direction is only from general to downstream.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, incremental learning requires old data from memory storage while our prompt-based learning method has no access to the pre-trained data. Another related field that leverages gradient matching to transfer knowledge is domain generalization [44,37] and multi-task learning [43,52]. However, their methods are not directly applicable in prompt tuning whose transfer direction is only from general to downstream.…”
Section: Related Workmentioning
confidence: 99%
“…* indicates results copied from[7]. ± 1.2443.74 ± 1 37. 53.33 ± 1.57 61.26 ± 1.45 65.00 ± 2.87 CoOp 59.44 ± 1.88 62.31 ± 1.40 66.72 ± 0.93 70.06 ± 0.53 73.48 ± 0.39 ProGrad 32.29 ± 1.12 46.14 ± 1.49 55.18 ± 1.99 62.05 ± 0.93 66.47 ± 1.69 CoOp + l2 prompt reg 60.84 ± 1.16 62.75 ± 1.18 66.85 ± 0.76 70.08 ± 0.58 72.92 ± 0.46 CoOp + GM 61.27 ± 0.96 63.23 ± 0.50 64.59 ± 0.63 66.40 ± 0.49 67.12 ± 0.29 CoOp + KD 61.52 ± 0.99 64.07 ± 0.52 66.52 ± 0.38 70.01 ± 0.31 72.01 ± 0.37 CoOp + ProGrad 62.61 ± 0.80 64.90 ± 0.86 68.45 ± 0.52 71.41 ± 0.49 73.95 ± 0.42 ImageNet Cosine 15.95 ± 0.07 26.56 ± 0.30 37.08 ± 0.29 46.18 ± 0.19 53.36 ± 0.39 CoOp 57.15 ± 1.03 57.25 ± 0.43 59.51 ± 0.25 61.59 ± 0.17 63.00 ± 0.18 ProGrad 19.21 ± 0.28 31.18 ± 0.18 42.59 ± 0.29 51.73 ± 0.18 57.65 ± 0.33 CoOp + l2 prompt reg 57.51 ± 0.22 61.27 ± 0.49 62.49 ± 0.12 62.71 ± 0.01 62.88 ± 0.09 CoOp + GM 60.41 ± 0.17 60.51 ± 0.13 60.75 ± 0.06 61.01 ± 0.14 61.44 ± 0.03 CoOp + KD 60.85 ± 0.22 61.08 ± 0.10 61.51 ± 0.07 61.67 ± 0.12 62.05 ± 0.09 CoOp + ProGrad 57.75 ± 0.24 59.75 ± 0.33 61.46 ± 0.07 62.54 ± 0.03 63.45 ± 0.08 Caltech101 Cosine 60.76 ± 1.71 73.10 ± 1.01 81.43 ± 0.65 87.02 ± 0.60 90.60 ± 0.05 CoOp 87.40 ± 0.98 87.92 ± 1.12 89.48 ± 0.47 90.25 ± 0.18 92.00 ± 0.02 ProGrad 61.95 ± 0.12 75.24 ± 0.88 82.98 ± 0.38 88.59 ± 0.21 91.31 ± 0.19 CoOp + l2 prompt reg 87.04 ± 0.61 87.37 ± 0.78 88.82 ± 0.40 89.62 ± 0.29 91.67 ± 0.26 CoOp + GM 89.14 ± 0.15 89.37 ± 0.26 89.64 ± 0.33 89.36 ± 0.31 89.42 ± 0.13 CoOp + KD 89.06 ± 0.29 89.71 ± 0.20 90.13 ± 0.16 90.09 ± 0.30 91.39 ± 0.05 CoOp + ProGrad 88.68 ± 0.34 87.98 ± 0.69 89.99 ± 0.26 90.83 ± 0.07 92.10 ± 0.39 OxfordPets Cosine 26.33 ± 0.75 41.60 ± 1.93 55.29 ± 1.97 66.60 ± 0.82 66.84 ± 16.24 CoOp 86.01 ± 0.47 82.21 ± 2.12 86.63 ± 1.02 85.15 ± 1.12 87.06 ± 0.88 ProGrad 26.08 ± 0.73 40.58 ± 2.01 55.23 ± 1.44 66.78 ± 1.58 68.96 ± 14.35 CoOp + l2 prompt reg 87.55 ± 0.15 82.12 ± 2.61 84.93 ± 1.77 84.38 ± 0.75 86.28 ± 0.45 CoOp + GM 87.05 ± 0.65 87.06 ± 0.67 88.45 ± 0.45 88.35 ± 0.15 88.38 ± 0.27 CoOp + KD 87.10 ± 1.47 87.40 ± 0.60 88.56 ± 0.19 88.77 ± 0.24 89.16 ± 0.16 CoOp + ProGrad 88.36 ± 0.73 86.89 ± 0.42 88.04 ± 0.50 87.91 ± 0.54 89.00 ± 0.32 StanfordCars Cosine 18.96 ± 0.34 33.37 ± 0.38 47.75 ± 0.38 61.30 ± 0.25 71.94 ± 0.31 CoOp 55.68 ± 1.23 58.33 ± 0.60 63.05 ± 0.09 68.37 ± 0.25 73.34 ± 0.49 ProGrad 21.13 ± 0.50 39.44 ± 0.83 54.54 ± 0.57 66.47 ± 0.14 73.41 ± 0.11 CoOp + l2 prompt reg 55.86 ± 0.66 57.69 ± 0.51 62.82 ± 0.07 66.63 ± 0.25 69.86 ± 0.44 CoOp + GM 57.37 ± 0.36 58.46 ± 0.24 59.72 ± 0.66 62.32 ± 0.59 63.87 ± 0.37 CoOp + KD 57.48 ± 1.47 59.09 ± 0.60 61.47 ± 0.19 67.73 ± 0.24 70.48 ± 0.16 CoOp + ProGrad 58.38 ± 0.23 61.81 ± 0.45 65.62 ± 0.43 69.29 ± 0.11 73.46 ± 0.29 Flowers102 Cosine 51.33 ± 2.77 70.06 ± 2.29 82.43 ± 1.65 91.74 ± 0.73 95.68 ± 0.22 CoOp 68.13 ± 1.74 76.68 ± 1.82 86.13 ± 0.75 91.74 ± 0.49 94.72 ± 0.34 ± 2.31 70.13 ± 1.90 81.09 ± 2.06 91.62 ± 0.41 93.94 ± 0.02 CoOp + l2 prompt reg 71.12 ± 0.55 80.36 ± 0.54 86.42 ± 0.33 91.58 ± 0.59 94.25 ± 0.38 CoOp + GM 67.87 ± 0.31 69.09 ± 0.49 71.69 ± 0.68 75.76 ± 0.79 78.36 ± 0.34 CoOp + KD 68.11 ± 1.47 71.02 ± 0.60 76.06 ± 0.19 84.53 ± 0.24 88.05 ± 0.16 CoOp + ProGrad 73.18 ± 0.73 79.77 ± 0.65 85.37 ± 0.96 91.64 ± 0.24 94.37 ± 0.24…”
mentioning
confidence: 99%
“…generalization for centralized training, second-order information is being found quite useful as a theoretical indicator. Recent works [19,21] find that forming representations that are "hard to vary" seem to result in better O.O.D. performance.…”
Section: Algorithm Analysis Based On Second-order Informationmentioning
confidence: 99%
“…Specifically, we carefully analyze the effectiveness of various data and structural regularization methods at reducing client drift and improving FL performance (Section 3). Utilizing secondorder information and insights from out-of-distribution generality literature [19,21], we identify theoretical indicators for successful FL optimization, and evaluate across a variety of FL settings for empirical validation.…”
Section: Introductionmentioning
confidence: 99%
“…GroupDRO [36] minimizes the worst-case risk across training domains. Recently, methods based on gradient matching have been proposed [31,37]. Finally, a few works introduce benchmarks for evaluating domain generalization methods [14,18].…”
Section: Related Workmentioning
confidence: 99%