Span Selection Pre-training for Question Answering

Glaß, Michael; Gliozzo, Alfio; Chakravarti, Rishav; Ferritto, Anthony; Pan, Lin; Bhargav, G P Shrivatsa; Garg, Dinesh; Sil, Avirup

doi:10.18653/v1/2020.acl-main.247

Cited by 51 publications

(49 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Reddit has been shown to provide natural conversational English data for learning semantic representations that work well in downstream tasks related to dialog and conversation (Al-Rfou et al, 2016;Cer et al, 2018;Henderson et al, 2019bCoope et al, 2020). Therefore, following 1 The pairwise cloze task has been inspired by the recent span selection objective applied to extractive QA by Glass et al (2020): they create examples emulating extractive QA pairs with long passages and short question sentences. Another similar approach to extractive QA has been proposed by Ram et al (2021).…”

Section: Pairwise Cloze Data Preparationmentioning

confidence: 99%

“…However, we detect several gaps with the existing setup, and set to address them in this work. First, recent work in NLP has validated that a stronger alignment between a pretraining task and an end task can yield performance gains for tasks such as extractive question answering (Glass et al, 2020) and paraphrase and translation (Lewis et al, 2020). We ask whether it is possible to design a pretraining task which is more suitable for slot labeling in conversational applications.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ConVEx: Data-Efficient and Few-Shot Slot Labeling

Henderson¹,

Vulić²

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

We propose ConVEx (Conversational Value Extractor), an efficient pretraining and finetuning neural approach for slot-labeling dialog tasks. Instead of relying on more general pretraining objectives from prior work (e.g., language modeling, response selection), Con-VEx's pretraining objective, a novel pairwise cloze task using Reddit data, is well aligned with its intended usage on sequence labeling tasks. This enables learning domain-specific slot labelers by simply fine-tuning decoding layers of the pretrained general-purpose sequence labeling model, while the majority of the pretrained model's parameters are kept frozen. We report state-of-the-art performance of ConVEx across a range of diverse domains and data sets for dialog slot-labeling, with the largest gains in the most challenging, few-shot setups. We believe that ConVEx's reduced pretraining times (i.e., only 18 hours on 12 GPUs) and cost, along with its efficient finetuning and strong performance, promise wider portability and scalability for data-efficient sequence-labeling tasks in general.

show abstract

Section: Pairwise Cloze Data Preparationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

ConVEx: Data-Efficient and Few-Shot Slot Labeling

Henderson¹,

Vulić²

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…Baseline 10.83 40.16 QFE (Nishida et al, 2019) 34.63 59.61 DFGN (Qiu et al, 2019) 33.62 59.82 TAP2 (Glass et al, 2019) 39.77 69.12 HGN (Fang et al, 2019) 43.57 71.03 SAE (Tu et al, 2019a) 45 eral extra modules in the graph fusion block, including query-entity attention, query update mechanism, and weak supervision. Prediction Layer.…”

Section: Settingmentioning

confidence: 99%

Is Graph Structure Necessary for Multi-hop Question Answering?

Shao¹,

Cui²,

Liu³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Recently, attempting to model texts as graph structure and introducing graph neural networks to deal with it has become a trend in many NLP research areas. In this paper, we investigate whether the graph structure is necessary for multi-hop question answering. Our analysis is centered on HotpotQA. We construct a strong baseline model to establish that, with the proper use of pre-trained models, graph structure may not be necessary for multi-hop question answering. We point out that both graph structure and adjacency matrix are task-related prior knowledge, and graphattention can be considered as a special case of self-attention. Experiments and visualized analysis demonstrate that graph-attention or the entire graph structure can be replaced by self-attention or Transformers.

show abstract

“…Table 1 shows that this strategy can provide an absolute improvement of 2.5% over a model that starts with just the default BERT language model. 12 See http://www.ibm.biz/confidence_ thresholding for more on choosing business specific thresholds 13 We only use 1 P100 GPU or 8 CPU threads in latency experiments Pre-Training EM F1 BERT (Devlin et al, 2018) We also employ (Glass et al, 2019)'s approach to using an unsupervised auxilary task that is better aligned to our final task (i.e. MRC) than the default Masked Language Model and Next Sentence Prediction used in (Devlin et al, 2018) to pre-train the BERT models.…”

Section: Pre-training and Data Augmentationmentioning

confidence: 99%

CFO: A Framework for Building Production NLP Systems

Chakravarti

Pendus²,

Sakrajda³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

View full text Add to dashboard Cite

This paper introduces a novel orchestration framework, called CFO (COMPUTATION FLOW ORCHESTRATOR), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments. We then demonstrate a question answering system built using this framework which incorporates state-of-the-art BERT based MRC (Machine Reading Comprehension) with IR components to enable end-to-end answer retrieval. Results from the demo system are shown to be high quality in both academic and industry domain specific settings. Finally, we discuss best practices when (pre-)training BERT based MRC models for production systems.

show abstract

Span Selection Pre-training for Question Answering

Cited by 51 publications

References 22 publications

ConVEx: Data-Efficient and Few-Shot Slot Labeling

ConVEx: Data-Efficient and Few-Shot Slot Labeling

Is Graph Structure Necessary for Multi-hop Question Answering?

CFO: A Framework for Building Production NLP Systems

Contact Info

Product

Resources

About