Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023) 2023
DOI: 10.18653/v1/2023.trustnlp-1.25
|View full text |Cite
|
Sign up to set email alerts
|

IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks

Abstract: Backdoor attacks are an insidious security threat against machine learning models. Adversaries can manipulate the predictions of compromised models by inserting triggers into the training phase. Various backdoor attacks have been devised which can achieve nearly perfect attack success without affecting model predictions for clean inputs. Means of mitigating such vulnerabilities are underdeveloped, especially in natural language processing. To fill this gap, we introduce IMBERT, which uses either gradients or s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 26 publications
(58 reference statements)
0
1
0
Order By: Relevance
“…The former method aims at identifying poisoned data by analyzing the anomalous characteristics of the training data He et al, 2023b). The latter approach leverages external tools (Qi et al, 2021a) or the victim language models themselves (Yang et al, 2021b;He et al, 2023a) to either remove the triggers or entirely discard the poisoned data samples during the inference.…”
Section: Security Challenges In Nlpmentioning
confidence: 99%
“…The former method aims at identifying poisoned data by analyzing the anomalous characteristics of the training data He et al, 2023b). The latter approach leverages external tools (Qi et al, 2021a) or the victim language models themselves (Yang et al, 2021b;He et al, 2023a) to either remove the triggers or entirely discard the poisoned data samples during the inference.…”
Section: Security Challenges In Nlpmentioning
confidence: 99%