Smoothed Inference for Adversarially-Trained Models

Nemcovsky, Yaniv; Zheltonozhskii, Evgenii; Baskin, Chaim; Chmiel, Brian; Fishman, Maxim; Bronstein, Alex M.; Mendelson, Avi

doi:10.48550/arxiv.1911.07198

Cited by 2 publications

(2 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(Goodfellow, Shlens, and Szegedy 2015) introduced one-step Fast Gradient Sign Method (FGSM) attack which was followed by more effective iterative attacks such as (Kurakin, Goodfellow, and Bengio 2016), PGD attack (Madry et al 2018), Carlini Wagner attack (Carlini and Wagner 2017), Momentum Iterative attack (Dong et al 2018), Diverse Input Iterative attack (Xie et al 2019b), Jacobianbased saliency map approach (Papernot et al 2016), etc. A parallel line of work has also emerged on finding strategies to defend against stronger adversarial attacks such as Adversarial Training (Madry et al 2018), Adversarial Logit Pairing (Kannan, Kurakin, and Goodfellow 2018), Ensemble Adversarial Training (Tramèr et al 2018), Parsevals Network (Cisse et al 2017), Feature Denoising Training (Xie et al 2019a), Latent Adversarial Training (Kumari et al 2019, Jacobian Adversarial Regularizer (Chan et al 2020), Smoothed Inference (Nemcovsky et al 2019), etc. The recent work of (Zhang et al 2019) explored the trade-off between adversarial robustness and accuracy.…”

Section: Related Workmentioning

confidence: 99%

Enhanced Regularizers for Attributional Robustness

Sarkar

Balasubramanian

2021

AAAI

View full text Add to dashboard Cite

Deep neural networks are the default choice of learning models for computer vision tasks. Extensive work has been carried out in recent years on explaining deep models for vision tasks such as classification. However, recent work has shown that it is possible for these models to produce substantially different attribution maps even when two very similar images are given to the network, raising serious questions about trustworthiness. To address this issue, we propose a robust attribution training strategy to improve attributional robustness of deep neural networks. Our method carefully analyzes the requirements for attributional robustness and introduces two new regularizers that preserve a model's attribution map during attacks. Our method surpasses state-of-the-art attributional robustness methods by a margin of approximately 3% to 9% in terms of attribution robustness measures on several datasets including MNIST, FMNIST, Flower and GTSRB.

show abstract