The widespread deployment of deep learning models in practice necessitates an assessment of their vulnerability, particularly in security-sensitive areas. As a result, transfer-based adversarial attacks have elicited increasing interest in assessing the security of deep learning models. However, adversarial samples usually exhibit poor transferability over different models because of overfitting of the particular architecture and feature representation of a source model. To address this problem, the Intermediate Layer Attack with Attention guidance (IAA) is proposed to alleviate overfitting and enhance the black-box transferability. The IAA works on an intermediate layer 𝑙 of the source model. Guided by the model's attention to the features of layer 𝑙, the attack algorithm seeks and undermines the key features that are likely to be adopted by diverse architectures. Significantly, IAA focuses on improving existing white-box attacks without introducing significant visual perceptual quality degradation. Namely, IAA maintains the white-box attack performance of the original algorithm while significantly enhancing its black-box transferability. Extensive experiments on ImageNet classifiers confirmed the effectiveness of our method. The proposed IAA outperformed all state-of-the-art benchmarks in various white-box and black-box settings.
INDEX TERMSDeep learning, adversarial samples, black-box attack, transferability, intermediate layer, attention-guided.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.