“…Attribution methods have been used to examine linguistic patterns in model behaviour, and it has been argued they provide more comprehensive insights than attention heatmaps (Bastings and Filippova, 2020), because attention only determines feature importance within a particular attention head, and not for model predictions as a whole (Jain and Wallace, 2019). Linguistic phenomena investigated using attribution methods include co-reference, negation, and syntactic structure (Jumelet et al, 2019;Wu et al, 2021;Nayak and Timmapathini, 2021;Jumelet and Zuidema, 2023). Within conversational NLP, feature attribution methods have been used to identify salient features in task-oriented dialogue modelling (Huang et al, 2020), dialogue response generation (Tuan et al, 2021), and turn-taking prediction (Ekstedt and Skantze, 2020).…”