“…We observe heterogeneous visual encodings adopted to support feature attributions. The most used ones are heatmaps for local post hoc feature attributions on images (Figure 7b) [vdBCR*20, HSL*21, HJZ*21, ZZM16, HCC*20, WGZ*19, SW17, CBN*20, JVW20] and text [CHS20, CGR*17, ŠSE*21, JTH*21]; matrices [WONM18, DWB21, JKV*22, PCN*19, LLL*19], node‐link diagrams [JTH*21, Vig19, JCM20, LLL*19] and custom Sankey diagrams [DWSZ20, PCN*19, HSG20, MSHB22] for self‐explainable attentive models; bar charts[WWM20, PCN*19] or averaged inputs [WGYS18, WGSY19] for global feature attribution; and enhanced line [SMM*19, CWGvW19, LYY*20, SWJ*20, ŠSE*21], area chart [KCK*19] or bar chart [MXC*20, KCK*19, SWJ*20, WWM20] for post hoc approaches to sequential data. Among them, systems that support the analysis of attentive models, and in particular of Transformers, employ the most complex and novel visualization techniques (Figure 7a) such as radial layouts [WTC21, DWB21] or grid ones [WTC21, DWB21, ŠSE*21]: these systems must show the flow of attention weights across multiple layers simultaneously to help the user understand the most important features.…”