2022
DOI: 10.48550/arxiv.2201.04234
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Abstract: Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples for which model confidence exceed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(15 citation statements)
references
References 28 publications
0
15
0
Order By: Relevance
“…OOD Robustness Hendrycks, Liu, Wallace, Dziedzic, Krishnan, and Song (2020b); Radford et al (2021) show that large pretrained models are more robust to distributions shi and Desai and Durre (2020) show that large pretrained models are be er calibrated on OOD inputs. ere is a also long line of literature on OOD detection (Hendrycks and Gimpel, 2016;Geifman and El-Yaniv, 2017;Liang, Li, and Srikant, 2017;Lakshminarayanan, Pritzel, and Blundell, 2016;Jiang, Kim, Guan, and Gupta, 2018;Zhang, Li, Guo, and Guo, 2020), uncertainty estimation (Ovadia, Fertig, Ren, Nado, Sculley, Nowozin, Dillon, Lakshminarayanan, and Snoek, 2019), and accuracy prediction (Deng and Zheng, 2021;Guillory, Shankar, Ebrahimi, Darrell, and Schmidt, 2021;Garg, Balakrishnan, Lipton, Neyshabur, and Sedghi, 2022) under distribution shi . Our work can be seen as an extreme version of "distribution shi ", using distributions focused on a single point.…”
Section: Related Workmentioning
confidence: 99%
“…OOD Robustness Hendrycks, Liu, Wallace, Dziedzic, Krishnan, and Song (2020b); Radford et al (2021) show that large pretrained models are more robust to distributions shi and Desai and Durre (2020) show that large pretrained models are be er calibrated on OOD inputs. ere is a also long line of literature on OOD detection (Hendrycks and Gimpel, 2016;Geifman and El-Yaniv, 2017;Liang, Li, and Srikant, 2017;Lakshminarayanan, Pritzel, and Blundell, 2016;Jiang, Kim, Guan, and Gupta, 2018;Zhang, Li, Guo, and Guo, 2020), uncertainty estimation (Ovadia, Fertig, Ren, Nado, Sculley, Nowozin, Dillon, Lakshminarayanan, and Snoek, 2019), and accuracy prediction (Deng and Zheng, 2021;Guillory, Shankar, Ebrahimi, Darrell, and Schmidt, 2021;Garg, Balakrishnan, Lipton, Neyshabur, and Sedghi, 2022) under distribution shi . Our work can be seen as an extreme version of "distribution shi ", using distributions focused on a single point.…”
Section: Related Workmentioning
confidence: 99%
“…Model selection on out-of-distribution (OOD) data is an important and challenging problem as noted by several authors [22,29,58,12]. [57,11] propose solutions specific to covariate shift based on parametric bootstrap and reweighing; [20] align model confidence and accuracy with a threshold; [27,10] train several models and use their ensembles or disagreement. Our importance weighting approach is computationally simpler than the latter and is more flexible in comparison to the former, as it allows for concept drift and can be used in downstream tasks beyond model selection as we demonstrate both theoretically and empirically.…”
Section: Related Workmentioning
confidence: 99%
“…We evaluate the ability of choosing a model for the target domain based on accuracy on the ExTRA reweighted source validation data. We compare to the standard source validation model selection (SrcVal) and to the recently proposed ATC-NE [20] that uses negative entropy of the predicted probabilities on the target domain to score models. We fit a total of 120 logistic regression models with different weighting (uniform, label balancing, and group balancing) and varying regularizers.…”
Section: Model Fine-tuningmentioning
confidence: 99%
“…A separate line of work departs from complexity measures altogether and directly predicts OOD generalization from unlabelled test data. These methods either predict the correctness of the model directly on individual examples [14,32,15] or directly estimate the total error [19,24,9,10,68]. Although these methods work well in practice, they do not provide any insight into the underlying mechanism of generalization since they act only on the output layer of the network.…”
Section: Related Workmentioning
confidence: 99%
“…Directly estimating the generalization of a trained model on test data is one approach to this problem [14,32,15,19]. However, these methods are typically calculated based on the output predictive distribution of a model, which can become poorly calibrated in out-of-domain settings.…”
Section: Introductionmentioning
confidence: 99%