“…Typically, this is done by constraining the perturbations with an p -norm, where the most common settings use either ∞ [11,12,9,8,13,14,15], 2 [16,9,17,18,19], or 1 [20,21]. As of now, the state-of-the-art empirical defense against adversarial attacks is iteratively retraining with adversarial examples [8]. While adversarial retraining by itself can help improve robustness, we have seen a fundamental trade-off between robustness and clean accuracy, as well as a lack of generalization across different attacks [22,23,24,25,26].…”