Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.9
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA

Abstract: Recent works have shown that supervised models often exploit data artifacts to achieve good test scores while their performance severely degrades on samples outside their training distribution. Contrast sets quantify this phenomenon by perturbing test samples in a minimal way such that the output label is modified. While most contrast sets were created manually, requiring intensive annotation effort, we present a novel method which leverages rich semantic input representation to automatically generate contras… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(17 citation statements)
references
References 21 publications
1
16
0
Order By: Relevance
“…6 Results specific concepts, corroborating the findings of (Bitton et al, 2021). Interestingly, the best performing model (LXMERT) is not always the most consistent.…”
Section: Metricssupporting
confidence: 65%
See 2 more Smart Citations
“…6 Results specific concepts, corroborating the findings of (Bitton et al, 2021). Interestingly, the best performing model (LXMERT) is not always the most consistent.…”
Section: Metricssupporting
confidence: 65%
“…Some recent work has sought to evaluate models using consistency and other metrics (Hudson and Manning, 2019;Shah et al, 2019;Ribeiro et al, 2020a;Selvaraju et al, 2020;Bitton et al, 2021). These evaluations often evaluate consistency through question entailment and implication, or simply contrasting examples in the case of (Bitton et al, 2021).…”
Section: Consistency As Model Comprehensionmentioning
confidence: 99%
See 1 more Smart Citation
“…We generate perturbations at the level of the underlying reasoning process, in the context of QA. Last, Bitton et al (2021) used scene graphs to generate examples for visual QA. However, they assumed the existence of gold scene graph at the input.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, methods for automatic generation of contrast sets were proposed. However, current methods are restricted to shallow surface perturbations (Mille et al, 2021;, specific reasoning skills , or rely on expensive annotations (Bitton et al, 2021). Thus, automatic generation of examples that test high-level reasoning abilities of models and their robustness to fine semantic distinctions remains an open challenge.…”
Section: Introductionmentioning
confidence: 99%