Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks

Malinin, Andrey; Band, Neil; Ganshin,; Alexander, Jan; Chesnokov, German; Gal, Yarin; Gales, Mark J. F.; Noskov, Alexey; Ploskonosov, Andrey; Prokhorenkova, Liudmila; Provilkov, Ivan; Raina, Vatsal; Vyas, Raina,; Roginskiy,; Denis, Denis; Shmatova, Mariya; Panos, Tigas,; Yangel, Boris

doi:10.48550/arxiv.2107.07455

Cited by 16 publications

(21 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Vehicle Motion prediction part of Shifts Dataset [16] contains 5 seconds of past and 5 seconds of future states for all agents in a scene along with overall scene features. The goal of the challenge is to build a model that predicts k ≤ 5 future trajectories ỹk i in the horizon of T = 25 timesteps along with their confidences ω k and overall scene uncertainty U for each scene x i .…”

Section: Problem Statementmentioning

confidence: 99%

Estimating Uncertainty For Vehicle Motion Prediction on Yandex Shifts Dataset

Pustynnikov¹,

Eremeev²

2021

Preprint

View full text Add to dashboard Cite

Motion prediction of surrounding agents is an important task in context of autonomous driving since it is closely related to driver's safety. Vehicle Motion Prediction (VMP) track of Shifts Challenge ** focuses on developing models which are robust to distributional shift and able to measure uncertainty of their predictions. In this work we present the approach that significantly improved provided benchmark and took 2nd place on the leaderboard.

show abstract

Section: Problem Statementmentioning

confidence: 99%

Estimating Uncertainty For Vehicle Motion Prediction on Yandex Shifts Dataset

Pustynnikov¹,

Eremeev²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We use F1@95 score to jointly evaluate uncertainty and robustness. A good uncertainty measure should achieve low R-AUC, high F1-AUC and high F1@95 scores [13]. These are presented in Table 2 .…”

Section: Ll-fisher Uncertainty (Ll-fu)mentioning

confidence: 99%

“…Most of the available datasets such as Imagenet -C [3], A [5], R [4], O [5] and WILDS [7] focus primarily on image classification tasks. The recently introduced Shifts Dataset [13] provides a favourable data setting. It is composed of three parts each corresponding to a different data modality: tabular weather prediction data, machine translation data and self-driving car data.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data

Lakara¹,

Bhandari²,

Pratinav³

et al. 2021

Preprint

View full text Add to dashboard Cite

Most machine learning models operate under the assumption that the training, testing and deployment data is independent and identically distributed (i.i.d.). This assumption doesn't generally hold true in a natural setting. Usually, the deployment data is subject to various types of distributional shifts. The magnitude of a model's performance is proportional to this shift in the distribution of the dataset. Thus it becomes necessary to evaluate a model's uncertainty and robustness to distributional shifts to get a realistic estimate of its expected performance on real-world data. Present methods to evaluate uncertainty and model's robustness are lacking and often fail to paint the full picture. Moreover, most analysis so far has primarily focused on classification tasks. In this paper, we propose more insightful metrics for general regression tasks using the Shifts Weather Prediction Dataset. We also present an evaluation of the baseline methods using these metrics.

show abstract

“…Nowadays, few researchers are surprised by NNs performing very well on some in-domain data distribution, and there has been increasing interest in developing models that are robust to domain shifts [8,9,10,11]. Here we focus on the recently proposed benchmark for evaluating domain robust systems, WILDS [11], and we share our empirical experience with two datasets of WILDS, iWildCam and FMoW, as well as their baseline models.…”

Section: Introductionmentioning

confidence: 99%

Improving Baselines in the Wild

Irie¹,

Schlag²,

Csordás³

et al. 2021

Preprint

View full text Add to dashboard Cite

We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts. Several experiments yield a couple of critical observations which we believe are of general interest for any future work on WILDS. Our study focuses on two datasets: iWildCam and FMoW. We show that (1) Conducting separate cross-validation for each evaluation metric is crucial for both datasets, (2) A weak correlation between validation and test performance might make model development difficult for iWildCam, (3) Minor changes in the training of hyperparameters improve the baseline by a relatively large margin (mainly on FMoW), (4) There is a strong correlation between certain domains and certain target labels (mainly on iWildCam). To the best of our knowledge, no prior work on these datasets has reported these observations despite their obvious importance. Our code is public. 1

show abstract

Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks

Cited by 16 publications

References 38 publications

Estimating Uncertainty For Vehicle Motion Prediction on Yandex Shifts Dataset

Estimating Uncertainty For Vehicle Motion Prediction on Yandex Shifts Dataset

Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data

Improving Baselines in the Wild

Contact Info

Product

Resources

About