2022
DOI: 10.1162/tacl_a_00450
|View full text |Cite
|
Sign up to set email alerts
|

Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition

Abstract: Recent efforts to create challenge benchmarks that test the abilities of natural language understanding models have largely depended on human annotations. In this work, we introduce the “Break, Perturb, Build” (BPB) framework for automatic reasoning-oriented perturbation of question-answer pairs. BPB represents a question by decomposing it into the reasoning steps that are required to answer it, symbolically perturbs the decomposition, and then generates new question-answer pairs. We demonstrate the effectiven… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 33 publications
0
10
0
Order By: Relevance
“…6 For IIRC, we consider two settings: gold-setting (IIRC-G) which uses only gold supporting sentences as reading comprehension context, and retrieved-setting (IIRC-R) which retrieves paragraphs using a retrieval marginalization method (Ni et al, 2021). We evaluate robustness using DROP contrast set and DROP BPB contrast set (Geva et al, 2022) 7 . For robustness evaluation, we only fine-tune on DROP dataset and evaluate on the contrast sets directly.…”
Section: Datasetsmentioning
confidence: 99%
See 2 more Smart Citations
“…6 For IIRC, we consider two settings: gold-setting (IIRC-G) which uses only gold supporting sentences as reading comprehension context, and retrieved-setting (IIRC-R) which retrieves paragraphs using a retrieval marginalization method (Ni et al, 2021). We evaluate robustness using DROP contrast set and DROP BPB contrast set (Geva et al, 2022) 7 . For robustness evaluation, we only fine-tune on DROP dataset and evaluate on the contrast sets directly.…”
Section: Datasetsmentioning
confidence: 99%
“…The BREAK dataset (Wolfson et al, 2020) on the other hand, defined a standardized meaning representation format (inspired by semantic parsing) for several QA datasets. This shared representation has allowed the development of contrastive datasets (Geva et al, 2022). In this work, we leverage these annotations to build a dataset to teach broad reasoning skills to models.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We consult quantifier taxonomy studies (Keenan and Westerståhl, 1997;Peters and Westerståhl, 2006;Szymanik and Thorne, 2015;Szymanik, 2016) and derive a categorization set for quantifier analysis in NLU benchmarks. In coreference (Ogrodniczuk et al, 2019, 2020), negation (Hossain et al, 2020Hartmann et al, 2021), and consistency (Li et al, 2019;Ribeiro et al, 2019;Asai and Hajishirzi, 2020;Geva et al, 2022)-there has been little work on generalized quantifiers as a source of error in NLU, let alone in multilingual NLU. It remains an open problem whether LMs represent the semantics of quantifiers words adequately, or if they provide a basis for resolving scopal ambiguities.…”
Section: Introductionmentioning
confidence: 99%
“…As QDMR is derived entirely from the original question, it is agnostic to the underlying domain, schema or even the form of knowledge representation. It has been applied to questions on text and images and relational databases [16,46,52]. Here, we utilize QDMR structure and show that it can successfully be mapped to SQL.…”
mentioning
confidence: 99%