Measuring the Mixing of Contextual Information in the Transformer

Ferrando, Juan; Gállego, Gerard I.; Costa-jussà, Marta R.

doi:10.48550/arxiv.2203.04212

Cited by 2 publications

(6 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…with the residual connection x i only considered in the transformed vector T i (x j=i ). Ferrando et al (2022) propose to use the Manhattan distance between the output vector and the transformed vector as a measure of the impact of x j on x i :…”

Section: Aggregation Of Layer-wise Token-to-token Interactions (Alti)mentioning

confidence: 99%

“…where each row in C x i ←x contains the contribution, or influence, of each x j in x i , i.e., the contribution of token representation j to token representation i 3 . ALTI method (Ferrando et al, 2022) follows the Transformer's modeling approach proposed by Abnar and Zuidema (2020), where the information flow in the model is simplified as a Directed Acyclic Graph, where nodes are token representations, and edges represent the influence of each input layer token x j in the output token x i . ALTI proposes using token contributions C instead of raw attention weights α.…”

Section: Aggregation Of Layer-wise Token-to-token Interactions (Alti)mentioning

confidence: 99%

“…Concurrently, encoder-based Transformers, such as BERT (Devlin et al, 2019) and RoBERTa , have been analysed with attention rollout (Abnar and Zuidema, 2020), which models the information flow in the model with a Directed Acyclic Graph, where nodes are token representations and edges, attention weights. Recently, Ferrando et al (2022) have presented ALTI (Aggregation of Layer-wise Tokens Attributions), which applies the attention rollout method by substituting attention weights with refined token-to-token interactions. In this work, we present the first application of a rollout-based method to the encoderdecoder Transformers.…”

Section: Introductionmentioning

confidence: 99%

“…In this section, we provide the background to understand our proposed method by briefly explaining the encoder-decoder Transformer-based model in the context of NMT (Vaswani et al, 2017) and the Aggregation of Layer-wise Token-to-token Interactions (ALTI) method (Ferrando et al, 2022).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

Ferrando¹,

Gállego²,

Belen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has focused solely on source sentence tokens attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target prefix) in the model predictions. In this work, we propose an interpretability method that tracks complete input token attributions. Our method, which can be extended to any encoder-decoder Transformer-based model, allows us to better comprehend the inner workings of current NMT models. We apply the proposed method to both bilingual and multilingual Transformers and present insights into their behaviour.

show abstract

Section: Aggregation Of Layer-wise Token-to-token Interactions (Alti)mentioning

confidence: 99%

Section: Aggregation Of Layer-wise Token-to-token Interactions (Alti)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

Ferrando¹,

Gállego²,

Belen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the computer vision literature, Chefer et al (2021b,a) combined this method with gradient information. Recently, Ferrando et al (2022) have presented ALTI (Aggregation of Layer-wise Tokens Attributions), which applies the attention rollout method by substituting attention weights with refined token-to-token interactions. In this work, we present the first application of a rollout-based method to sequence to sequence Transformers.…”

Section: Introductionmentioning

confidence: 99%

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

Ferrando¹,

Gállego²,

Belen³

et al. 2022

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has mainly focused solely on source sentence tokens' attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target prefix) in the model predictions. In this work, we propose an interpretability method that tracks input tokens' attributions for both contexts. Our method, which can be extended to any encoder-decoder Transformer-based model, allows us to better comprehend the inner workings of current NMT models. We apply the proposed method to both bilingual and multilingual Transformers and present insights into their behaviour.

show abstract

Measuring the Mixing of Contextual Information in the Transformer

Cited by 2 publications

References 18 publications

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

Contact Info

Product

Resources

About