“…Improving the generalization of deep learning models has become a major research topic, with many different threads of research including Bayesian deep learning (Neal, 1996;Gal, 2016), adversarial (Engstrom et al, 2019;Jacobsen et al, 2018) and non-adversarial (Hendrycks & Dietterich, 2019;Yin et al, 2019) robustness, causality (Arjovsky et al, 2019), and other works aimed at distinguishing statistical features from semantic features (Gowal et al, 2019;Geirhos et al, 2018). While neural networks often exhibit superhuman generalization performance on the training distribution, they can be extremely sensitive to minute changes in distribution (Su et al, 2019;Engstrom et al, 2017; In this work, we consider out-of-distribution (OoD) generalization, where a model must generalize to new distributions at test time without seeing any training data from them.…”