2023
DOI: 10.48550/arxiv.2303.13500
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias

Abstract: Advances in the expressivity of pretrained models have increased interest in the design of adaptation protocols which enable safe and effective transfer learning. Going beyond conventional linear probing (LP) and fine tuning (FT) strategies, protocols that can effectively control feature distortion, i.e., the failure to update features orthogonal to the in-distribution, have been found to achieve improved outof-distribution generalization (OOD). In order to limit this distortion, the LP+FT protocol, which firs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…One promising technique is to stop the LP stage early before it reaches convergence, resulting in a non-converged head that improves performance during the subsequent FT stage (Ren et al 2023). Additionally, applying hardness-promoting augmentation during the LP stage can help mitigate feature distortion and also simplicity bias, leading to enhanced generalization performance (Trivedi, Koutra, and Thiagarajan 2023). Although these head initialization techniques have shown effectiveness in boosting the performance of the FT stage, they require a separate training stage to initialize the head layer before the FT stage.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…One promising technique is to stop the LP stage early before it reaches convergence, resulting in a non-converged head that improves performance during the subsequent FT stage (Ren et al 2023). Additionally, applying hardness-promoting augmentation during the LP stage can help mitigate feature distortion and also simplicity bias, leading to enhanced generalization performance (Trivedi, Koutra, and Thiagarajan 2023). Although these head initialization techniques have shown effectiveness in boosting the performance of the FT stage, they require a separate training stage to initialize the head layer before the FT stage.…”
Section: Related Workmentioning
confidence: 99%
“…However, in transfer learning scenarios, where the data distribution between the source domain and target domain may significantly differ, the movement of statistics within batch normalization layers can result in severe feature distortion. As a solution, many methods choose to freeze the batch normalization statistics during fine-tuning to alleviate feature distortion (Kumar et al 2022;Ren et al 2023;Trivedi, Koutra, and Thiagarajan 2023). On the other hand, some methods leverage batch normalization with domain-specific statistics to encourage the learning of more generalized features (Wang et al 2019;Chang et al 2019) or even use statistics from test batches to enhance robustness against domain shift (Mirza et al 2022;Lim et al 2023).…”
Section: Related Workmentioning
confidence: 99%