“…Existing methods use side information such as multiple training environments [1,12,55], counterfactual examples [25,72], or non-stationary time series [23,30,57]. Importantly, OOD generalization is not achievable only through regularizers, network architectures, or unsupervised control of inductive biases [6]. To make this limitation intuitive, consider the task of image recognition in Figure 1.…”