Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1610
|View full text |Cite
|
Sign up to set email alerts
|

A Gated Self-attention Memory Network for Answer Selection

Abstract: Answer selection is an important research problem, with applications in many areas. Previous deep learning based approaches for the task mainly adopt the Compare-Aggregate architecture that performs word-level comparison followed by aggregation. In this work, we take a departure from the popular Compare-Aggregate architecture, and instead, propose a new gated self-attention memory network for the task. Combined with a simple transfer learning technique from a large-scale online corpus, our model outperforms pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 30 publications
(34 citation statements)
references
References 19 publications
0
34
0
Order By: Relevance
“…The third block presents the results of models that use both pre-trained language models and transfer learning. In particular, Yoon et al (2018) use ELMo and transfer learning on the QNLI dataset; Lai et al (2019) use BERT and perform transfer learning on the QNLI dataset, and Garg et al (2019) use RoBERTa large and perform transfer learning from the Natural Question dataset. We note that the MAP of efficient models ranges between 71% to 75%, while the MAP of expensive models ranges between 83% to 92%.…”
Section: State-of-the-art Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The third block presents the results of models that use both pre-trained language models and transfer learning. In particular, Yoon et al (2018) use ELMo and transfer learning on the QNLI dataset; Lai et al (2019) use BERT and perform transfer learning on the QNLI dataset, and Garg et al (2019) use RoBERTa large and perform transfer learning from the Natural Question dataset. We note that the MAP of efficient models ranges between 71% to 75%, while the MAP of expensive models ranges between 83% to 92%.…”
Section: State-of-the-art Resultsmentioning
confidence: 99%
“…Despite obtaining better results than previous approaches, the computational cost of performing word-level attention and the aggregation steps to leverage the information extracted by the attention mechanism increases the computational cost of previous methods. More recent models, e.g., (Lai et al, 2019;Garg et al, 2019;Yoon et al, 2018), leverage contextualized word representation, e.g., pre-trained using BERT, ELMo, RoBERTa, etc. These approaches achieve state-of-the-art results for AS2, but they require significant computational power for both pre-training, fine-tuning, and testing on the final task.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…It is an active research problem with applications in many areas (Tay et al, 2018a;Tayyar Madabushi et al, 2018;Rao et al, 2019;Lai et al, 2020). Similar to most recent papers on this topic (Tay et al, 2018b;Lai et al, 2019;Garg et al, 2020), we cast the question answering problem as a binary classification problem by concatenating the question with each of the candidate answers and assigning positive label to the concatenation containing the correct answer.…”
Section: Proposed Frameworkmentioning
confidence: 99%
“…To update the memory, however, we first need an indexing mechanism for writing. Instead of using the original indexing of the NTM, we adopt the simpler indexing procedure from the memory network, which has been proven to be useful in this task (Lai et al, 2019). At time step t, for each incoming data point x t , we compute the attention weight w e t i for the support vector e t i :…”
Section: Proposed Frameworkmentioning
confidence: 99%