Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.256
|View full text |Cite
|
Sign up to set email alerts
|

Generating Label Cohesive and Well-Formed Adversarial Claims

Abstract: Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances they are inserted in. In addition, such attacks produce semantically nonsensical inputs, as they simply concatenate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
33
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 32 publications
(34 citation statements)
references
References 26 publications
(20 reference statements)
1
33
0
Order By: Relevance
“…The inter-annotator agreement (computed with Cohen's kappa [7]) between the annotators is 0.47, which signals "moderate" agreement [24]. This is comparable to the inter-annotator agreement in Atanasova et al [1], where claims generated with GPT-2 were annotated for semantic coherence. Table 10 shows the Cohen's kappa for each dataset separately.…”
Section: Example #4supporting
confidence: 51%
See 4 more Smart Citations
“…The inter-annotator agreement (computed with Cohen's kappa [7]) between the annotators is 0.47, which signals "moderate" agreement [24]. This is comparable to the inter-annotator agreement in Atanasova et al [1], where claims generated with GPT-2 were annotated for semantic coherence. Table 10 shows the Cohen's kappa for each dataset separately.…”
Section: Example #4supporting
confidence: 51%
“…While much recent work in adversarial attacks aims to break NLI systems and is especially adapted to this problem [13,29], these stress tests have been applied to several other tasks, e.g. Question-Answering [49], Machine Translation [4], or Fact Checking [1,44]. Unfortunately, preserving the semantics of a sentence while automatically generating these adversarial attacks is difficult, which is why some works have defined small stress tests manually [19,27].…”
Section: Datasetmentioning
confidence: 99%
See 3 more Smart Citations