The Explanation Game: Towards Prediction Explainability through Sparse Communication

Treviso, Marcos; Martins, André F. T.

doi:10.18653/v1/2020.blackboxnlp-1.10

Cited by 23 publications

(37 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For all rationalizers, we map each input word to 300D-pretrained GloVe embeddings from 840B release (Pennington et al, 2014) that are kept frozen. We instantiate all encoder networks as bidirectional LSTM (Hochreiter and Schmidhuber, 1997) layers (BiLSTM) (w/ hidden size 200) similarly to Lei et al (2016); Bastings et al (2019); Treviso and Martins (2020). Although other works (Jain et al, 2020;Paranjape et al, 2020) use more powerful BERT-based representations, we firstly experimented with BiLSTM layers and noticed our results were competitive with those reported in Jain et al (2020).…”

Section: C1 Rationalizers Experimental Setupmentioning

confidence: 99%

“…These approaches are often brittle and fragile for the high sensitivity that they show to changes in the hyperparameters and to variability due to sampling. On the other hand, existing rationalizers that use sparse attention mechanisms (Treviso and Martins, 2020) such as sparsemax attention, while being deterministic and end-to-end differentiable, do not have a direct handle to constrain the rationale in terms of sparsity and contiguity. We endow them with these capabilities in this paper as shown in Table 1, where we position our work in the literature for highlights extraction.…”

Section: Rationalization For Highlights Extractionmentioning

confidence: 99%

“…There is a long string of work on interpreting predictions made by neural networks (Lipton, 2017;Doshi-Velez and Kim, 2017;Gilpin et al, 2019;Wiegreffe and Marasović, 2021;Zhang et al, 2021a). Our paper focus on selective rationalizers, which have been used for extraction of text highlights (Lei et al, 2016;Bastings et al, 2019;Yu et al, 2019;DeYoung et al, 2019;Treviso and Martins, 2020;Zhang et al, 2021b) and text matchings (Swanson et al, 2020). Most works rely on stochastic rationale generation or deterministic attention mechanisms, but the two approaches have never been extensively compared.…”

Section: Related Workmentioning

confidence: 99%

“…Other works use strategies such as top-k to map token-level scores to rationales, but also require gradient estimations to train both modules jointly (Paranjape et al, 2020;Chang et al, 2020). In turn, sparse attention mechanisms (Treviso and Martins, 2020) are deterministic and have exact gradients, but lack a direct way to control sparsity and contiguity in the rationale extraction. This raises the question: how can we build an easy-to-train fully differentiable rationalizer that allows for flexible constrained rationale extraction?…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

SPECTRA: Sparse Structured Text Rationalization

Guerreiro¹,

Martins²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Selective rationalization aims to produce decisions along with rationales (e.g., text highlights or word alignments between two sentences). Commonly, rationales are modeled as stochastic binary masks, requiring samplingbased gradient estimators, which complicates training and requires careful hyperparameter tuning. Sparse attention mechanisms are a deterministic alternative, but they lack a way to regularize the rationale extraction (e.g., to control the sparsity of a text highlight or the number of alignments). In this paper, we present a unified framework for deterministic extraction of structured explanations via constrained inference on a factor graph, forming a differentiable layer. Our approach greatly eases training and rationale regularization, generally outperforming previous work on what comes to performance and plausibility of the extracted rationales. We further provide a comparative study of stochastic and deterministic methods for rationale extraction for classification and natural language inference tasks, jointly assessing their predictive power, quality of the explanations, and model variability.

show abstract

Section: C1 Rationalizers Experimental Setupmentioning

confidence: 99%

Section: Rationalization For Highlights Extractionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

SPECTRA: Sparse Structured Text Rationalization

Guerreiro¹,

Martins²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Chen and Ji (2020) propose learning a variational word mask to improve model interpretability. Finally, extracting a short snippet from the original input text (rationale) and using it to make a prediction has been recently proposed (Lei et al, 2016;Bastings et al, 2019;Treviso and Martins, 2020;Jain et al, 2020;Chalkidis et al, 2021). Nguyen (2018) and Atanasova et al (2020) compare explanations produced by different approaches, showing that in most cases gradientbased approaches outperform sparse linear metamodels.…”

Section: Model Interpretabilitymentioning

confidence: 99%

Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification

Chrysostomou¹,

Αλέτρας²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Neural network architectures in natural language processing often use attention mechanisms to produce probability distributions over input token representations. Attention has empirically been demonstrated to improve performance in various tasks, while its weights have been extensively used as explanations for model predictions.Recent studies (Jain and Wallace, 2019; Serrano and Smith, 2019; Wiegreffe and Pinter, 2019) have showed that it cannot generally be considered as a faithful explanation (Jacovi and Goldberg, 2020) across encoders and tasks. In this paper, we seek to improve the faithfulness of attention-based explanations for text classification. We achieve this by proposing a new family of Task-Scaling (TaSc) mechanisms that learn task-specific non-contextualised information to scale the original attention weights. Evaluation tests for explanation faithfulness, show that the three proposed variants of TaSc improve attentionbased explanations across two attention mechanisms, five encoders and five text classification datasets without sacrificing predictive performance. Finally, we demonstrate that TaSc consistently provides more faithful attentionbased explanations compared to three widelyused interpretability techniques. 1

show abstract

EEG to fMRI Synthesis for Medical Decision Support: A Case Study on Schizophrenia Diagnosis

Calhas

Henriques

2023

Preprint

View full text Add to dashboard Cite

Electroencephalography (EEG) measures the neuronal activity at the scalp, while functional magnetic resonance imaging (fMRI) provides a sub-cortical view of blood supply in the human brain. Although fMRI is known for providing rich spatial information, it is expensive and of restricted use. EEG to fMRI synthesis is a cross modal research area that bridges the gap between the two and has recently received attention. Although these studies promise lower healthcare costs and ambulatory assessments, their utility in diagnostic settings is still largely untapped. Using simultaneous EEG and fMRI recordings, this study combines a state-of-the-art synthesis model with a modified contrastive loss, and subsequent prediction layering, to unprecedentedly assess its predictive power in schizophrenia diagnosis. In addition, we perform an exhaustive search for the (synthesized) hemodynamic brain patterns able to discriminate schizophrenia. Schizophrenia diagnosis using synthesized hemodynamics yield an area under the ROC curve of 0.77, confirming the validity of the undertaken neuroimaging synthesis. Experiments further revealed schizophrenia-related patterns in frontal, left temporal and cerebellum regions of the brain. Altogether, our results suggest that a synthesized fMRI view is able to discriminate this pathology, and it contains discriminative patterns of brain activity in accordance with related work on schizophrenia.

show abstract

The Explanation Game: Towards Prediction Explainability through Sparse Communication

Cited by 23 publications

References 46 publications

SPECTRA: Sparse Structured Text Rationalization

SPECTRA: Sparse Structured Text Rationalization

Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification

EEG to fMRI Synthesis for Medical Decision Support: A Case Study on Schizophrenia Diagnosis

Contact Info

Product

Resources

About