Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Shen, Tao; Zhou, Tianyi; Long, Guodong; Jiang, Jing; Wang, Sen; Zhang, Chengqi

doi:10.24963/ijcai.2018/604

Cited by 102 publications

(67 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In practice, when treating content selection as latent variables, the model tends to end up with a trivial solution of always selecting all source tokens (Shen et al, 2018a;Ke et al, 2018). This behavior is understandable since Eq.…”

Section: Degree Of Controllabilitymentioning

confidence: 99%

Select and Attend: Towards Controllable Content Selection in Text Generation

Shen¹,

Suzuki

Inui

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Many text generation tasks naturally contain two steps: content selection and surface realization. Current neural encoder-decoder models conflate both steps into a black-box architecture. As a result, the content to be described in the text cannot be explicitly controlled. This paper tackles this problem by decoupling content selection from the decoder. The decoupled content selection is human interpretable, whose value can be manually manipulated to control the content of generated text. The model can be trained end-to-end without human annotations by maximizing a lower bound of the marginal likelihood. We further propose an effective way to trade-off between performance and controllability with a single adjustable hyperparameter. In both data-to-text and headline generation tasks, our model achieves promising results, paving the way for controllable content selection in text generation.

show abstract

Section: Degree Of Controllabilitymentioning

confidence: 99%

Select and Attend: Towards Controllable Content Selection in Text Generation

Shen¹,

Suzuki

Inui

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…Out-of-vocabulary words in the testing dataset were represented by the GloVe embedding of their most similar word, as given by the Jaro-Winkler similarity metric [51]. [7] 88.0 72.4 71.9 Nie and Bansal [31] 86.1 74.6 73.6 Chen et al [8] 85.5 74.9 74.9 Conneau et al [12] 85.0 --Densely Interactive Inference Network [16] 88.0 79.2 79.1 Directional Self-Attention Encoders [38] 85.6 71.0 71.4 Compare-Propagate Alignment-Factorized Encoders [42] 85.9 78.7 77.9 Gumbel TreeLSTM Encoders [10] 86.0 --Reinforced Self-Attention Network [39] 86.3 --Distance-Based Self-Attention Network [18] 86. The model was implemented with the keras 7 deep learning framework, and the corresponding source code is available on GitHub 8 .…”

Section: Parameters Involved In the Proposed Approachmentioning

confidence: 99%

Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News

Borges

Martins

Calado

2019

J. Data and Information Quality

View full text Add to dashboard Cite

Fake news are nowadays an issue of pressing concern, given their recent rise as a potential threat to high-quality journalism and well-informed public discourse. The Fake News Challenge (FNC-1) was organized in early 2017 to encourage the development of machine learning-based classification systems for stance detection (i.e., for identifying whether a particular news article agrees, disagrees, discusses, or is unrelated to a particular news headline), thus helping in the detection and analysis of possible instances of fake news. This article presents a novel approach to tackle this stance detection problem, based on the combination of string similarity features with a deep neural network architecture that leverages ideas previously advanced in the context of learning efficient text representations, document classification, and natural language inference. Specifically, we use bi-directional Recurrent Neural Networks (RNNs), together with max-pooling over the temporal/sequential dimension and neural attention, for representing (i) the headline, (ii) the first two sentences of the news article, and (iii) the entire news article. These representations are then combined/compared, complemented with similarity features inspired on other FNC-1 approaches, and passed to a final layer that predicts the stance of the article towards the headline. We also explore the use of external sources of information, specifically large datasets of sentence pairs originally proposed for training and evaluating natural language inference methods, in order to pre-train specific components of the neural network architecture (e.g., the RNNs used for encoding sentences). The obtained results attest to the effectiveness of the proposed ideas and show that our model, particularly when considering pre-training and the combination of neural representations together with similarity features, slightly outperforms the previous state-of-the-art. 39:2 • Borges et al.is increasingly harder to know for sure what to trust, with the absorption of fake news by the masses having increasingly harmful consequences [48]. Automatically dealing with fake news has drawn considerable attention in several research communities [24,26,34,36,40,41,45]. However, the task of evaluating the veracity of news articles is still very demanding and complex, even for trained specialists and much more for automated systems.A useful first step towards identifying fake news articles relates to understanding what other news agencies, in a given moment, are reporting about the same topic. This sub-task is often referred to as stance detection, and automating this process might be useful in developing automated assistants to help in fact checking. In particular, an automatic approach to stance detection would allow, for example, someone to insert an allegation or a news title, and recover related articles that either agree, disagree, or discuss that title. Then, the human checker would use her own judgment to assess the situation.Based on the aforementioned general ideas, a Fake Ne...

show abstract

“…The MR, SST, TREC, CR, SUBJ and MPQA are evaluated with accuracy (Conneau et al 2017). The MRPC is evaluated with both accuracy and F1 (Subramanian et al 2018). The SICK-E and SICK-R are evaluated with Pearson correlation (Tai, Socher, and Manning 2015).…”

Section: Swapping Training For Nlp Transfer Tasksmentioning

confidence: 99%

“…The essential component of CAFE is a comparepropagate architecture which first compares the two text fragments and then propagate the aligned features to upper layers for representation learning. Shen et al (2018) presented reinforced self-attention (ReSA), which aims to combine the benefit of soft attention and a newly proposed hard attention mechanism called reinforced sequence sampling (RSS). They further plugged this ReSA onto a source2token self-attention model and applied to NLI tasks.…”

Section: Related Workmentioning

confidence: 99%

What if We Simply Swap the Two Text Fragments? A Straightforward yet Effective Way to Test the Robustness of Methods to Confounding Signals in Nature Language Inference Tasks

Wang

Sun

Xing³

2019

AAAI

View full text Add to dashboard Cite

Nature language inference (NLI) task is a predictive task of determining the inference relationship of a pair of natural language sentences. With the increasing popularity of NLI, many state-of-the-art predictive models have been proposed with impressive performances. However, several works have noticed the statistical irregularities in the collected NLI data set that may result in an over-estimated performance of these models and proposed remedies. In this paper, we further investigate the statistical irregularities, what we refer as confounding factors, of the NLI data sets. With the belief that some NLI labels should preserve under swapping operations, we propose a simple yet effective way (swapping the two text fragments) of evaluating the NLI predictive models that naturally mitigate the observed problems. Further, we continue to train the predictive models with our swapping manner and propose to use the deviation of the model's evaluation performances under different percentages of training text fragments to be swapped to describe the robustness of a predictive model. Our evaluation metrics leads to some interesting understandings of recent published NLI methods. Finally, we also apply the swapping operation on NLI models to see the effectiveness of this straightforward method in mitigating the confounding factor problems in training generic sentence embeddings for other NLP transfer tasks.

show abstract

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Cited by 102 publications

References 2 publications

Select and Attend: Towards Controllable Content Selection in Text Generation

Select and Attend: Towards Controllable Content Selection in Text Generation

Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News

What if We Simply Swap the Two Text Fragments? A Straightforward yet Effective Way to Test the Robustness of Methods to Confounding Signals in Nature Language Inference Tasks

Contact Info

Product

Resources

About