2022 IEEE Symposium on Security and Privacy (SP) 2022
DOI: 10.1109/sp46214.2022.9833572
|View full text |Cite
|
Sign up to set email alerts
|

Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
78
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 62 publications
(78 citation statements)
references
References 46 publications
0
78
0
Order By: Relevance
“…In our experiment, this paper uses PyTorch's built-in distribution package 24 , and uses synchronous logic to implement the server's parameter update process. In this section, we compare the defense effects of different defense models and analyze the possible problems.…”
Section: Experiments and Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In our experiment, this paper uses PyTorch's built-in distribution package 24 , and uses synchronous logic to implement the server's parameter update process. In this section, we compare the defense effects of different defense models and analyze the possible problems.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…In 2018, Bagdasaryan et al 24 . demonstrated a backdoor attack against federated learning that optimized the attacker to a model with a backdoor and added an item in the loss to keep the new parameter close to the original parameter to achieve an efficient attack.…”
Section: Existing Attacksmentioning
confidence: 99%
“…Specifically, a trojaned model behaves normally on benign inputs but has attacker-chosen behaviors if the input contains the trigger. In this paper, we focus on classification models following previous papers on trojan defenses [3,4,34,44,46,55], where the attacker-chosen abnormal behavior is to misclassify the input with the trigger as the target label.…”
Section: Trojan Attacks On Dnnsmentioning
confidence: 99%
“…Most of the current works focus on injecting textual triggers to the context via learning, including characterlevel manipulation [15,37], word-level replacement [15,69], and sentence-level [26,37]. Recent works have been studied towards poisoning language models with adversarial data [6,36,73], inspired by some existing attacks in computer vision [38]. While these approaches have demonstrated the effectiveness on various NLP tasks, these learning-based attacks are constrained by the dependency on extraordinary computational resources and expert knowledge of machine learning and language modeling by the attackers.…”
Section: Backdoor Attacksmentioning
confidence: 99%