Toward Mitigating Adversarial Texts

Alshemali, Basemah; Kalita, Jugal

doi:10.5120/ijca2019919384

Cited by 12 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(2) The Bi-LSTM model is more vulnerable to the two attacks than the CNN model by a 12.45% accuracy difference on average. This supports the conclusion from previous research that, in the NLP domain, deep CNNs tend to be more robust than RNN models (Ren et al, 2019;Alshemali and Kalita, 2019 Table 3: The accuracy of the nonneural classification models under adversarial attacks, with and without the defense applied. Percent Increase is the percent increase of the classification accuracy with the defense applied.…”

Section: Effectiveness Of the Defensesupporting

confidence: 88%

Generalization to Mitigate Synonym Substitution Attacks

Alshemali¹,

Kalita²

2020

Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning

Self Cite

View full text Add to dashboard Cite

Studies have shown that deep neural networks are vulnerable to adversarial examples -perturbed inputs that cause DNN-based models to produce incorrect results. One robust adversarial attack in the NLP domain is the synonym substitution. In attacks of this variety, the adversary substitutes words with synonyms. Since synonym substitution perturbations aim to satisfy all lexical, grammatical, and semantic constraints, they are difficult to detect with automatic syntax check as well as by humans. In this work, we propose the first defensive method to mitigate synonym substitution perturbations that can improve the robustness of DNNs with both clean and adversarial data. We improve the generalization of DNN-based classifiers by replacing the embeddings of the important words in the input samples with the average of their synonyms' embeddings. By doing so, we reduce model sensitivity to particular words in the input samples. Our algorithm is generic enough to be applied in any NLP domain and to any model trained on any natural language.

show abstract

Section: Effectiveness Of the Defensesupporting

confidence: 88%

Generalization to Mitigate Synonym Substitution Attacks

Alshemali¹,

Kalita²

2020

Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning

Self Cite

View full text Add to dashboard Cite

show abstract

“…Defenses based on spell and syntax checkers are successful against character-level text attacks (Pruthi et al, 2019;Wang et al, 2019;Alshemali and Kalita, 2019). In contrast, these solutions are not effective against word-level attacks preserving language correctness (Wang et al, 2019).…”

Section: Defense Against Adversarial Attacks In Nlpmentioning

confidence: 99%

“That Is a Suspicious Reaction!”: Interpreting Logits Variation to Detect NLP Adversarial Attacks

Mosca¹,

Agarwal²,

Rando-Ramirez³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in computer vision has been carried to develop reliable defense strategies. However, the same issue remains less explored in natural language processing. Our work presents a model-agnostic detector of adversarial text examples. The approach identifies patterns in the logits of the target classifier when perturbing the input text. The proposed detector improves the current state-ofthe-art performance in recognizing adversarial inputs and exhibits strong generalization capabilities across different NLP models, datasets, and word-level attacks.

show abstract

“…Following the methodology used in [2] to test the spellcheckers, we generated four types of adversarial text including our attack using a list of top 20 frequent words in the SMS dataset. We used the DeepWordBug method proposed by Gao et al [17] to generate three adversarial texts: (1) insertion: inserted one random character to the words (e.g., c*all), (2) deletion: we removed the second character (one was removed per word), and (3) swapping: we swapped the second and third characters in the word (one swap per word).…”

Section: Effect Of Adversarial Text On Auto-correctionmentioning

confidence: 99%

“…On the other hand, using spell checking algorithms is the most common defence method against character-level perturbation in NLP tasks [29]. Although spell checking methods could detect and correct errors or adversarial examples, they cannot be applied in all domains because their performance varies depending on the type of misspelling [2].…”

Section: Introductionmentioning

confidence: 99%