Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.247
|View full text |Cite
|
Sign up to set email alerts
|

Span Selection Pre-training for Question Answering

Abstract: BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pretrained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better align the pre-training from memorization to understanding. Span Selection Pre-Training (SSPT) poses cloze-like train… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
41
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 51 publications
(49 citation statements)
references
References 22 publications
1
41
0
Order By: Relevance
“…Reddit has been shown to provide natural conversational English data for learning semantic representations that work well in downstream tasks related to dialog and conversation (Al-Rfou et al, 2016;Cer et al, 2018;Henderson et al, 2019bCoope et al, 2020). Therefore, following 1 The pairwise cloze task has been inspired by the recent span selection objective applied to extractive QA by Glass et al (2020): they create examples emulating extractive QA pairs with long passages and short question sentences. Another similar approach to extractive QA has been proposed by Ram et al (2021).…”
Section: Pairwise Cloze Data Preparationmentioning
confidence: 99%
See 1 more Smart Citation
“…Reddit has been shown to provide natural conversational English data for learning semantic representations that work well in downstream tasks related to dialog and conversation (Al-Rfou et al, 2016;Cer et al, 2018;Henderson et al, 2019bCoope et al, 2020). Therefore, following 1 The pairwise cloze task has been inspired by the recent span selection objective applied to extractive QA by Glass et al (2020): they create examples emulating extractive QA pairs with long passages and short question sentences. Another similar approach to extractive QA has been proposed by Ram et al (2021).…”
Section: Pairwise Cloze Data Preparationmentioning
confidence: 99%
“…However, we detect several gaps with the existing setup, and set to address them in this work. First, recent work in NLP has validated that a stronger alignment between a pretraining task and an end task can yield performance gains for tasks such as extractive question answering (Glass et al, 2020) and paraphrase and translation (Lewis et al, 2020). We ask whether it is possible to design a pretraining task which is more suitable for slot labeling in conversational applications.…”
Section: Introductionmentioning
confidence: 99%
“…Baseline 10.83 40.16 QFE (Nishida et al, 2019) 34.63 59.61 DFGN (Qiu et al, 2019) 33.62 59.82 TAP2 (Glass et al, 2019) 39.77 69.12 HGN (Fang et al, 2019) 43.57 71.03 SAE (Tu et al, 2019a) 45 eral extra modules in the graph fusion block, including query-entity attention, query update mechanism, and weak supervision. Prediction Layer.…”
Section: Settingmentioning
confidence: 99%
“…Table 1 shows that this strategy can provide an absolute improvement of 2.5% over a model that starts with just the default BERT language model. 12 See http://www.ibm.biz/confidence_ thresholding for more on choosing business specific thresholds 13 We only use 1 P100 GPU or 8 CPU threads in latency experiments Pre-Training EM F1 BERT (Devlin et al, 2018) We also employ (Glass et al, 2019)'s approach to using an unsupervised auxilary task that is better aligned to our final task (i.e. MRC) than the default Masked Language Model and Next Sentence Prediction used in (Devlin et al, 2018) to pre-train the BERT models.…”
Section: Pre-training and Data Augmentationmentioning
confidence: 99%