“…We observe that neural interpretation approaches fall within several broad categories: visualizations and heatmaps (Karpathy et al, 2015;Strobelt et al, 2016), gradient-based analyses (Potapenko et al, 2017;Samek et al, 2017b;Bach et al, 2015;Arras et al, 2017), learning disentangled representations during training (Whitney, 2016;Siddharth et al, 2017;Esmaeili et al, 2018), and model probes (Shi et al, 2016a;Adi et al, 2016;Conneau et al, 2018;Zhu et al, 2018;Kuncoro et al, 2018;Khandelwal et al, 2018). Our work uses linear probes as a method to identify the function of groups of neurons that are correlated with linguistic and tasklevel features, rather than for interpretation of individual neurons.…”