“…We provide 2 leaderboards for the CIFAR-10 dataset, one for average case performance, which ranks defenses based on CR ind-avg , and one for worst case performance, which ranks defenses based on CR ind-worst . Our leaderboard contains evaluations for 16 pretrained models, all of which use training-based defenses, including techniques for training on unions of p norms (Maini et al, 2020;Tramèr & Boneh, 2019;Madaan et al, 2020), training with novel threat models (Laidlaw et al, 2021), regularization based approaches (Jin & Rinard, 2020;Dai et al, 2022), and p norm adversarial training (Madry et al, 2018;Zhang et al, 2019;Rebuffi et al, 2021). We include details of the models present on the leaderboard in Appendix D. We note that these models are trained with either 2 attacks with = 0.5, ∞ attacks with = 8 255 , LPIPS attacks with = 1, or the union of 1 , = 2000 255 , 2 , = 128 255 and ∞ , = 8 255 attacks.…”