Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN

Chaabouni, Rahma; Dessì, Roberto; Kharitonov, Eugene

doi:10.18653/v1/2021.blackboxnlp-1.9

Cited by 14 publications

(16 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although we learnt about mechanics involved in idiomatic translations, the vast majority of translations was still word for word, indicating that noncompositional processing does not emerge well (enough) in Transformer. Paradoxically, a recent trend is to encourage more compositional processing in NMT (Chaabouni et al, 2021;Li et al, 2021;Raunak et al, 2019, i.a.). We recommend caution since this inductive bias may harm idiom translations.…”

Section: Discussionmentioning

confidence: 99%

“…These patterns are stronger for figurative PIEs that the model paraphrases than for sentences that receive an overly compositional translation and hold across the seven European languages. Considering that a recent trend in NLP is to encourage even more compositional processing in NMT (Raunak et al, 2019;Chaabouni et al, 2021;Li et al, 2021, i.a. ), we recommend caution.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation

Dankers¹,

Lucas²,

Titov³

2022

Preprint

View full text Add to dashboard Cite

Unlike literal expressions, idioms' meanings do not directly follow from their parts, posing a challenge for neural machine translation (NMT). NMT models are often unable to translate idioms accurately and over-generate compositional, literal translations. In this work, we investigate whether the non-compositionality of idioms is reflected in the mechanics of the dominant NMT model, Transformer, by analysing the hidden states and attention patterns for models with English as source language and one of seven European languages as target language. When Transformer emits a non-literal translation -i.e. identifies the expression as idiomatic -the encoder processes idioms more strongly as single lexical units compared to literal expressions. This manifests in idioms' parts being grouped through attention and in reduced interaction between idioms and their context. In the decoder's cross-attention, figurative inputs result in reduced attention on source-side tokens. These results suggest that Transformer's tendency to process idioms as compositional expressions contributes to literal translations of idioms.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation

Dankers¹,

Lucas²,

Titov³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Ruis, Burghouts, & Bucur, 2021), natural language processing Baroni, 2020;Keysers et al, 2020;Kim & Linzen, 2020), and more generally (Nam & McClelland, 2021). Two fundamentally different approaches are taken by the literature; one utilizes additional data while making few changes to the conventional setup and architecture (Furrer, van Zee, Scales, & Schärli, 2020), while the other utilizes additional inductive biases that aim to support systematic generalization (Russin et al, 2019;Lake, 2019;Andreas, 2020;Nye et al, 2020;Gordon et al, 2020;Bogin et al, 2021;Chaabouni, Dessì, & Kharitonov, 2021). In this work we apply both approaches, the former through data augmentation, and the latter through high-level modularity.…”

Section: Related Workmentioning

confidence: 99%

Improving Systematic Generalization Through Modularity and Augmentation

Ruis¹,

Lake²

2022

Preprint

View full text Add to dashboard Cite

Systematic generalization is the ability to combine known parts into novel meaning; an important aspect of efficient human learning, but a weakness of neural network learning. In this work, we investigate how two well-known modeling principlesmodularity and data augmentation-affect systematic generalization of neural networks in grounded language learning. We analyze how large the vocabulary needs to be to achieve systematic generalization and how similar the augmented data needs to be to the problem at hand. Our findings show that even in the controlled setting of a synthetic benchmark, achieving systematic generalization remains very difficult. After training on an augmented dataset with almost forty times more adverbs than the original problem, a non-modular baseline is not able to systematically generalize to a novel combination of a known verb and adverb. When separating the task into cognitive processes like perception and navigation, a modular neural network is able to utilize the augmented data and generalize more systematically, achieving 70% and 40% exact match increase over state-of-the-art on two gSCAN tests that have not previously been improved. We hope that this work gives insight into the drivers of systematic generalization, and what we still need to improve for neural networks to learn more like humans do.

show abstract

“…The first prominent type of generalisation that can be found in the literature is compositional generalisation, which is often argued to underpin human's ability to quickly generalise to new data, tasks and domains (Fodor and Pylyshyn, 1988;Lake et al, 2017;Marcus, 2018;Schmidhuber, 1990). Because of this strong connection with humans and human language, work on compositional generalisation often has a primarily cognitive motivation, although practical concerns such as sample efficiency, quick adaptation and good generalisation in low-resource scenarios are frequently mentioned as additional or alternative motivations (Chaabouni et al, 2021;Linzen, 2020, to give just a few examples). While it has a strong intuitive appeal and clear mathematical definition (Montague, 1970), compositional generalisation is not easy to pin down empirically.…”

Section: Compositional Generalisationmentioning

confidence: 99%

State-of-the-art generalisation research in NLP: A taxonomy and review

Hupkes¹,

Giulianelli²,

Dankers³

et al. 2022

Preprint

View full text Add to dashboard Cite

The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any common standards to evaluate it. In this paper, we aim to lay the groundwork to improve both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP, we use that taxonomy to present a comprehensive map of published generalisation studies, and we make recommendations for which areas might deserve attention in the future. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they aim to solve, the type of data shift they consider, the source by which this data shift is obtained, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 previous papers that test generalisation, for a total of more than 600 individual experiments. Considering the results of this review, we present an in-depth analysis of the current state of generalisation research in NLP, and make recommendations for the future. Along with this paper, we release a webpage where the results of our review can be dynamically explored, and which we intend to update as new NLP generalisation studies are published. With this work, we aim to make steps towards making state-of-the-art generalisation testing the new status quo in NLP.

show abstract

Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN

Cited by 14 publications

References 21 publications

Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation

Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation

Improving Systematic Generalization Through Modularity and Augmentation

State-of-the-art generalisation research in NLP: A taxonomy and review

Contact Info

Product

Resources

About