Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence 2018
DOI: 10.24963/ijcai.2018/604
|View full text |Cite
|
Sign up to set email alerts
|

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Abstract: Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
66
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 102 publications
(67 citation statements)
references
References 2 publications
1
66
0
Order By: Relevance
“…In practice, when treating content selection as latent variables, the model tends to end up with a trivial solution of always selecting all source tokens (Shen et al, 2018a;Ke et al, 2018). This behavior is understandable since Eq.…”
Section: Degree Of Controllabilitymentioning
confidence: 99%
“…In practice, when treating content selection as latent variables, the model tends to end up with a trivial solution of always selecting all source tokens (Shen et al, 2018a;Ke et al, 2018). This behavior is understandable since Eq.…”
Section: Degree Of Controllabilitymentioning
confidence: 99%
“…Out-of-vocabulary words in the testing dataset were represented by the GloVe embedding of their most similar word, as given by the Jaro-Winkler similarity metric [51]. [7] 88.0 72.4 71.9 Nie and Bansal [31] 86.1 74.6 73.6 Chen et al [8] 85.5 74.9 74.9 Conneau et al [12] 85.0 --Densely Interactive Inference Network [16] 88.0 79.2 79.1 Directional Self-Attention Encoders [38] 85.6 71.0 71.4 Compare-Propagate Alignment-Factorized Encoders [42] 85.9 78.7 77.9 Gumbel TreeLSTM Encoders [10] 86.0 --Reinforced Self-Attention Network [39] 86.3 --Distance-Based Self-Attention Network [18] 86. The model was implemented with the keras 7 deep learning framework, and the corresponding source code is available on GitHub 8 .…”
Section: Parameters Involved In the Proposed Approachmentioning
confidence: 99%
“…The MR, SST, TREC, CR, SUBJ and MPQA are evaluated with accuracy (Conneau et al 2017). The MRPC is evaluated with both accuracy and F1 (Subramanian et al 2018). The SICK-E and SICK-R are evaluated with Pearson correlation (Tai, Socher, and Manning 2015).…”
Section: Swapping Training For Nlp Transfer Tasksmentioning
confidence: 99%
“…The essential component of CAFE is a comparepropagate architecture which first compares the two text fragments and then propagate the aligned features to upper layers for representation learning. Shen et al (2018) presented reinforced self-attention (ReSA), which aims to combine the benefit of soft attention and a newly proposed hard attention mechanism called reinforced sequence sampling (RSS). They further plugged this ReSA onto a source2token self-attention model and applied to NLI tasks.…”
Section: Related Workmentioning
confidence: 99%