Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Garg, Saurabh; Balakrishnan, Sivaraman; Neyshabur, Behnam; Sedghi, Hanie

doi:10.48550/arxiv.2201.04234

Cited by 8 publications

(15 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…OOD Robustness Hendrycks, Liu, Wallace, Dziedzic, Krishnan, and Song (2020b); Radford et al (2021) show that large pretrained models are more robust to distributions shi and Desai and Durre (2020) show that large pretrained models are be er calibrated on OOD inputs. ere is a also long line of literature on OOD detection (Hendrycks and Gimpel, 2016;Geifman and El-Yaniv, 2017;Liang, Li, and Srikant, 2017;Lakshminarayanan, Pritzel, and Blundell, 2016;Jiang, Kim, Guan, and Gupta, 2018;Zhang, Li, Guo, and Guo, 2020), uncertainty estimation (Ovadia, Fertig, Ren, Nado, Sculley, Nowozin, Dillon, Lakshminarayanan, and Snoek, 2019), and accuracy prediction (Deng and Zheng, 2021;Guillory, Shankar, Ebrahimi, Darrell, and Schmidt, 2021;Garg, Balakrishnan, Lipton, Neyshabur, and Sedghi, 2022) under distribution shi . Our work can be seen as an extreme version of "distribution shi ", using distributions focused on a single point.…”

Section: Related Workmentioning

confidence: 99%

Deconstructing Distributions: A Pointwise Framework of Learning

Kaplun¹,

Ghosh²,

Garg³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

In machine learning, we traditionally evaluate the performance of a single model, averaged over a collection of test inputs. In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a single input point. Speci cally, we study a point's pro le: the relationship between models' average performance on the test distribution and their pointwise performance on this individual point. We nd that pro les can yield new insights into the structure of both models and data-in and out-of-distribution. For example, we empirically show that real data distributions consist of points with qualitatively di erent pro les. On one hand, there are "compatible" points with strong correlation between the pointwise and average performance. On the other hand, there are points with weak and even negative correlation: cases where improving overall model accuracy actually hurts performance on these inputs. We prove that these experimental observations are inconsistent with the predictions of several simpli ed models of learning proposed in prior work. As an application, we use pro les to construct a dataset we call CIFAR-10-N : a subset of CINIC-10 such that for standard models, accuracy on CIFAR-10-N is negatively correlated with accuracy on CIFAR-10 test. is illustrates, for the rst time, an OOD dataset that completely inverts "accuracy-on-the-line" (Miller,

show abstract

Section: Related Workmentioning

confidence: 99%

Deconstructing Distributions: A Pointwise Framework of Learning

Kaplun¹,

Ghosh²,

Garg³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Model selection on out-of-distribution (OOD) data is an important and challenging problem as noted by several authors [22,29,58,12]. [57,11] propose solutions specific to covariate shift based on parametric bootstrap and reweighing; [20] align model confidence and accuracy with a threshold; [27,10] train several models and use their ensembles or disagreement. Our importance weighting approach is computationally simpler than the latter and is more flexible in comparison to the former, as it allows for concept drift and can be used in downstream tasks beyond model selection as we demonstrate both theoretically and empirically.…”

Section: Related Workmentioning

confidence: 99%

“…We evaluate the ability of choosing a model for the target domain based on accuracy on the ExTRA reweighted source validation data. We compare to the standard source validation model selection (SrcVal) and to the recently proposed ATC-NE [20] that uses negative entropy of the predicted probabilities on the target domain to score models. We fit a total of 120 logistic regression models with different weighting (uniform, label balancing, and group balancing) and varying regularizers.…”

Section: Model Fine-tuningmentioning

confidence: 99%

Understanding new tasks through the lens of training data via exponential tilting

Maity¹,

Yurochkin²,

Banerjee³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deploying machine learning models on new tasks is a major challenge despite the large size of the modern training datasets. However, it is conceivable that the training data can be reweighted to be more representative of the new (target) task. We consider the problem of reweighing the training samples to gain insights into the distribution of the target task. Specifically, we formulate a distribution shift model based on the exponential tilt assumption and learn train data importance weights minimizing the KL divergence between labeled train and unlabeled target datasets. The learned train data weights can then be used for downstream tasks such as target performance evaluation, fine-tuning, and model selection. We demonstrate the efficacy of our method on WATERBIRDS and BREEDS benchmarks. 1

show abstract

“…A separate line of work departs from complexity measures altogether and directly predicts OOD generalization from unlabelled test data. These methods either predict the correctness of the model directly on individual examples [14,32,15] or directly estimate the total error [19,24,9,10,68]. Although these methods work well in practice, they do not provide any insight into the underlying mechanism of generalization since they act only on the output layer of the network.…”

Section: Related Workmentioning

confidence: 99%

“…Directly estimating the generalization of a trained model on test data is one approach to this problem [14,32,15,19]. However, these methods are typically calculated based on the output predictive distribution of a model, which can become poorly calibrated in out-of-domain settings.…”

Section: Introductionmentioning

confidence: 99%

Predicting Out-of-Domain Generalization with Local Manifold Smoothness

Ng¹,

Cho²,

Hulkund³

et al. 2022

Preprint

View full text Add to dashboard Cite

Understanding how machine learning models generalize to new environments is a critical part of their safe deployment. Recent work has proposed a variety of complexity measures that directly predict or theoretically bound the generalization capacity of a model. However, these methods rely on a strong set of assumptions that in practice are not always satisfied. Motivated by the limited settings in which existing measures can be applied, we propose a novel complexity measure based on the local manifold smoothness of a classifier. We define local manifold smoothness as a classifier's output sensitivity to perturbations in the manifold neighborhood around a given test point. Intuitively, a classifier that is less sensitive to these perturbations should generalize better. To estimate smoothness we sample points using data augmentation and measure the fraction of these points classified into the majority class. Our method only requires selecting a data augmentation method and makes no other assumptions about the model or data distributions, meaning it can be applied even in out-of-domain (OOD) settings where existing methods cannot. In experiments on robustness benchmarks in image classification, sentiment analysis, and natural language inference, we demonstrate a strong and robust correlation between our manifold smoothness measure and actual OOD generalization on over 3,000 models evaluated on over 100 train/test domain pairs. Preprint. Under review.

show abstract

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Cited by 8 publications

References 28 publications

Deconstructing Distributions: A Pointwise Framework of Learning

Deconstructing Distributions: A Pointwise Framework of Learning

Understanding new tasks through the lens of training data via exponential tilting

Predicting Out-of-Domain Generalization with Local Manifold Smoothness

Contact Info

Product

Resources

About