Explaining Contextualization in Language Models using Visual Analytics

Sevastjanova, Rita; Kalouli, Aikaterini-Lida; Beck, Christin; Schäfer, Hanna; El‐Assady, Mennatallah

doi:10.18653/v1/2021.acl-long.39

Cited by 9 publications

(11 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…43 More specifically, this means that many words will need to have several different embeddings so that a context-dependent choice can be made for each situation. 44 To achieve this, the use of deep learning models is a popular choice, and approaches have, for example, been developed for recursive neural networks, 45 convolutional neural networks, 46 and recurrent neural networks. 47 Arguably, the current state-of-the-art technology for text embedding is the Universal Sentence Encoder (USE), 48 but the previously mentioned BERT algorithm also works for text of sentence length.…”

Section: Related Workmentioning

confidence: 99%

“…43 More specifically, this means that many words will need to have several different embeddings so that a context-dependent choice can be made for each situation. 44 To achieve this, the Figure 1. Using the EEVO tool to visualize the performance of embedding-based ensembles conducting text similarity calculations on a large set of scientific publications (see further Section Visualization).…”

Section: Word and Text Embeddingmentioning

confidence: 99%

See 1 more Smart Citation

Interactive optimization of embedding-based text similarity calculations

Witschard

Jusufi

Martins

et al. 2022

Information Visualization

View full text Add to dashboard Cite

Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.

show abstract

Section: Related Workmentioning

confidence: 99%

“…43 More specifically, this means that many words will need to have several different embeddings so that a context-dependent choice can be made for each situation. 44 To achieve this, the Figure 1. Using the EEVO tool to visualize the performance of embedding-based ensembles conducting text similarity calculations on a large set of scientific publications (see further Section Visualization).…”

Section: Word and Text Embeddingmentioning

confidence: 99%

Interactive optimization of embedding-based text similarity calculations

Witschard

Jusufi

Martins

et al. 2022

Information Visualization

View full text Add to dashboard Cite

show abstract

“…Most common explainability techniques either use supervised probing methods, i.e., linear classification models predicting specific linguistic properties (e.g., [Eth19]), or apply adversarial testing to conclude about models' capability of learning specific context properties (e.g., [MPL19]). However, the findings of these two strands of research are often contradictory [SKB∗21]. At the same time, visual analytics approaches are used for the explainability of embedding contextualization.…”

Section: Introductionmentioning

confidence: 99%

“…A further, main contribution of this paper is an interactive explanation workspace that visualizes the computed score values. The visual representation of the scores is crucial due to the huge amount of data that is generated and has to be investigated, and because the embedding contextualization differs depending on the token's role (e.g., meaning or function) in its context [SKB∗21]. Visualizations are effective means for generating insights into such (complex) data patterns [KAF∗08].…”

Section: Introductionmentioning

confidence: 99%

LMFingerprints: Visual Explanations of Language Model Embedding Spaces through Layerwise Contextualization Scores

Fellner

Kalouli

Beck

et al. 2022

Computer Graphics Forum

View full text Add to dashboard Cite

Language models, such as BERT, construct multiple, contextualized embeddings for each word occurrence in a corpus. Understanding how the contextualization propagates through the model's layers is crucial for deciding which layers to use for a specific analysis task. Currently, most embedding spaces are explained by probing classifiers; however, some findings remain inconclusive. In this paper, we present LMFingerprints, a novel scoring‐based technique for the explanation of contextualized word embeddings. We introduce two categories of scoring functions, which measure (1) the degree of contextualization, i.e., the layerwise changes in the embedding vectors, and (2) the type of contextualization, i.e., the captured context information. We integrate these scores into an interactive explanation workspace. By combining visual and verbal elements, we provide an overview of contextualization in six popular transformer‐based language models. We evaluate hypotheses from the domain of computational linguistics, and our results not only confirm findings from related work but also reveal new aspects about the information captured in the embedding spaces. For instance, we show that while numbers are poorly contextualized, stopwords have an unexpected high contextualization in the models' upper layers, where their neighborhoods shift from similar functionality tokens to tokens that contribute to the meaning of the surrounding sentences.

show abstract

“…Contextualized LMs are usually pre-trained on a language modeling task (e.g., next word prediction) and are used as transfer-learning methods in other NLP tasks [64]. Adaptation to tasks is typically carried out through fine-tuning of the model, or part of it, on domain-specific data.…”

Section: From Text To Vectorsmentioning

confidence: 99%

Untitled

View full text Add to dashboard Cite

Explaining Contextualization in Language Models using Visual Analytics

Cited by 9 publications

References 42 publications

Interactive optimization of embedding-based text similarity calculations

Interactive optimization of embedding-based text similarity calculations

LMFingerprints: Visual Explanations of Language Model Embedding Spaces through Layerwise Contextualization Scores

Untitled

Contact Info

Product

Resources

About