Learning compositionally through attentive guidance

Hupkes, Dieuwke; Singh, Anand; Korrel, Kris; Kruszewski, Germán; Bruni, Elia

doi:10.48550/arxiv.1805.09657

Cited by 8 publications

(12 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our method draws inspiration from the work on compositional learning of Hupkes et al (2018a). The authors introduce the concept of Attentive Guidance, a training signal given to the attention mechanism of a seq2seq model to induce more compositional solutions.…”

Section: Modelsmentioning

confidence: 99%

“…We borrow the setup presented in Hupkes et al (2018a), which differs slightly from the setup as it was originally presented. In this setup, a typical input output example could be 001 t1 t2 → 001 010 111.…”

Section: Taskmentioning

confidence: 99%

“…The remaining 2 held-out inputs are used to test for generalization. Following Hupkes et al (2018a), we do not include all 64 binary compositions in the training set, but leave out some for testing. In particular, we create one test set that contains all binary compositions containing t7 or t8, which are thus only seen in the training set as unary compositions.…”

Section: Taskmentioning

confidence: 99%

See 2 more Smart Citations

Transcoding Compositionally: Using Attention to Find More Generalizable Solutions

Korrel¹,

Hupkes

Dankers

et al. 2019

Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Self Cite

View full text Add to dashboard Cite

While sequence-to-sequence models have shown remarkable generalization power across several natural language tasks, their construct of solutions are argued to be less compositional than human-like generalization. In this paper, we present seq2attn, a new architecture that is specifically designed to exploit attention to find compositional patterns in the input. In seq2attn, the two standard components of an encoder-decoder model are connected via a transcoder, that modulates the information flow between them. We show that seq2attn can successfully generalize, without requiring any additional supervision, on two tasks which are specifically constructed to challenge the compositional skills of neural networks. The solutions found by the model are highly interpretable, allowing easy analysis of both the types of solutions that are found and potential causes for mistakes. We exploit this opportunity to introduce a new paradigm to test compositionality that studies the extent to which a model overgeneralizes when confronted with exceptions. We show that seq2attn exhibits such overgeneralization to a larger degree than a standard sequence-to-sequence model.

show abstract

Section: Modelsmentioning

confidence: 99%

Section: Taskmentioning

confidence: 99%

Section: Taskmentioning

confidence: 99%

See 1 more Smart Citation

Transcoding Compositionally: Using Attention to Find More Generalizable Solutions

Korrel¹,

Hupkes

Dankers

et al. 2019

Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our findings indicate the following. Firstly, as hypothesized before (Dessì and Baroni, 2019;Hupkes et al, 2018), the limited attention span provides a useful inductive bias that allows models to perform better on compositional generalization induction, that SCAN probes for. Further, endowing a model with SCAN-style generalization capabilities can lead to improvements in low-resource and distribution-shifted scenarios as long as we ensure that we do not overfit to SCAN.…”

Section: Discussionmentioning

confidence: 99%

Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN

Chaabouni¹,

Dessì²,

Kharitonov³

2021

Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

View full text Add to dashboard Cite

Despite their failure to solve the compositional SCAN dataset, seq2seq architectures still achieve astonishing success on more practical tasks. This observation pushes us to question the usefulness of SCAN-style compositional generalization in realistic NLP tasks. In this work, we study the benefit that such compositionality brings about to several machine translation tasks. We present several focused modifications of Transformer that greatly improve generalization capabilities on SCAN and select one that remains on par with a vanilla Transformer on a standard machine translation (MT) task. Next, we study its performance in low-resource settings and on a newly introduced distribution-shifted English-French translation task.Overall, we find that improvements of a SCAN-capable model do not directly transfer to the resource-rich MT setup. In contrast, in the low-resource setup, general modifications lead to an improvement of up to 13.1% BLEU score w.r.t. a vanilla Transformer. Similarly, an improvement of 14% in an accuracy-based metric is achieved in the introduced compositional English-French translation task. This provides experimental evidence that the compositional generalization assessed in SCAN is particularly useful in resource-starved and distribution-shifted scenarios.

show abstract

“…Phrase and sentence composition has drawn frequent attention in analysis of neural models, often focusing on analysis of internal representations and downstream task behavior (Ettinger et al, 2018;Conneau et al, 2019;Nandakumar et al, 2019;Yu and Ettinger, 2020;Bhathena et al, 2020;Mu and Andreas, 2020; 1 Datasets and code available at https://github.com/yulang/fine-tuning-and-compositionin-transformers Andreas, 2019). Some work investigates compositionality via constructing linguistic (Keysers et al, 2019) and non-linguistic (Liška et al, 2018;Hupkes et al, 2018;Baan et al, 2019) synthetic datasets.…”

Section: Related Workmentioning

confidence: 99%

On the Interplay Between Fine-tuning and Composition in Transformers

Ettinger

2021

Preprint

View full text Add to dashboard Cite

Pre-trained transformer language models have shown remarkable performance on a variety of NLP tasks.However, recent research has suggested that phrase-level representations in these models reflect heavy influences of lexical content, but lack evidence of sophisticated, compositional phrase information (Yu and Ettinger, 2020). Here we investigate the impact of fine-tuning on the capacity of contextualized embeddings to capture phrase meaning information beyond lexical content. Specifically, we fine-tune models on an adversarial paraphrase classification task with high lexical overlap, and on a sentiment classification task. After fine-tuning, we analyze phrasal representations in controlled settings following prior work. We find that fine-tuning largely fails to benefit compositionality in these representations, though training on sentiment yields a small, localized benefit for certain models. In follow-up analyses, we identify confounding cues in the paraphrase dataset that may explain the lack of composition benefits from that task, and we discuss potential factors underlying the localized benefits from sentiment training.

show abstract

Learning compositionally through attentive guidance

Cited by 8 publications

References 8 publications

Transcoding Compositionally: Using Attention to Find More Generalizable Solutions

Transcoding Compositionally: Using Attention to Find More Generalizable Solutions

Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN

On the Interplay Between Fine-tuning and Composition in Transformers

Contact Info

Product

Resources

About