TimeSHAP: Explaining Recurrent Models through Sequence Perturbations

Bento, João; Saleiro, Pedro; Cruz, André Ferreira; Figueiredo, Mário A. T.; Bizarro, Pedro

doi:10.1145/3447548.3467166

Cited by 45 publications

(18 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In LSTM-based models, attention is computed over hidden representations across timesteps, which does not provide faithful tokenlevel attribution. Approaches that trace explanations back to individual timesteps (Bento et al, 2020) or input tokens (Tutek and Snajder, 2020) are only just emerging. Therefore, we limit ourselves to an analysis of the raw attention weights.…”

Section: Attention-based Explanationsmentioning

confidence: 99%

Order in the Court: Explainable AI Methods Prone to Disagreement

Neely¹,

F.²,

Bleeker³

et al. 2021

Preprint

View full text Add to dashboard Cite

In Natural Language Processing, featureadditive explanation methods quantify the independent contribution of each input token towards a model's decision. By computing the rank correlation between attention weights and the scores produced by a small sample of these methods, previous analyses have sought to either invalidate or support the role of attentionbased explanations as a faithful and plausible measure of salience. To investigate what measures of rank correlation can reliably conclude, we comprehensively compare feature-additive methods, including attention-based explanations, across several neural architectures and tasks. In most cases, we find that none of our chosen methods agree. Therefore, we argue that rank correlation is largely uninformative and does not measure the quality of featureadditive methods. Additionally, the range of conclusions a practitioner may draw from a single explainability algorithm are limited.

show abstract

Section: Attention-based Explanationsmentioning

confidence: 99%

Order in the Court: Explainable AI Methods Prone to Disagreement

Neely¹,

F.²,

Bleeker³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Numerous novel methods have been developed to tackle medically-relevant tasks in the time domain, such as: prediction [10,11], causal inference [12,13,14], time-to-event analysis [15,16,17], clustering [18,19] 3 , as well as data imputation [20,21], and model interpretability [22,23] methods, among others. Yet currently a significant limitation exists in the lack of standardization of both data representation and model benchmarking [7,9].…”

Section: Abstract Machine Learning • Time Series • Medicine 1 Time Do...mentioning

confidence: 99%

“…Furthermore, availability of open access data in this field is also improving [4,5,6], attracting significant attention from the artificial intelligence (AI), machine learning (ML) and deep learning (DL) research, as well as the medical data science communities [7,8,9]. A such, it is evident that the temporal setting is becoming the cornerstone for ML in healthcare and medicine, with a significant potential for impact.Numerous novel methods have been developed to tackle medically-relevant tasks in the time domain, such as: prediction [10,11], causal inference [12,13,14], time-to-event analysis [15,16,17], clustering [18,19] 3 , as well as data imputation [20,21], and model interpretability [22,23] methods, among others. Yet currently a significant limitation exists in the lack of standardization of both data representation and model benchmarking [7,9].…”

mentioning

confidence: 99%

TemporAI: Facilitating Machine Learning Innovation in Time Domain Tasks for Medicine

Saveliev¹,

Schaar²

2023

Preprint

View full text Add to dashboard Cite

TemporAI is an open source Python software library for machine learning (ML) tasks involving data with a time component, focused on medicine and healthcare use cases. It supports data in time series, static, and eventmodalities and provides an interface for prediction, causal inference, and time-to-event analysis, as well as common preprocessing utilities and model interpretability methods. The library aims to facilitate innovation in the medical ML space by offering a standardized temporal setting toolkit for model development, prototyping and benchmarking, bridging the gaps in the ML research, healthcare professional, medical/pharmacological industry, and data science communities. TemporAI is available on GitHub 1 and we welcome community engagement through use, feedback, and code contributions. Keywords Machine Learning • Time Series • Medicine 1 Time domain is crucial for ML in medicineData with a time component 2 are ubiquitous in modern healthcare and medicine: from patient electronic health records (EHRs) [1], to data streams from Internet-of-Things (IoT) devices and consumer wearables [2], to large public health datasets [3], naming just a few key growing areas. In fact, since patient information is typically associated with a particular time point, the vast majority of healthcare data is temporal, and may be viewed as a time series. Furthermore, availability of open access data in this field is also improving [4,5,6], attracting significant attention from the artificial intelligence (AI), machine learning (ML) and deep learning (DL) research, as well as the medical data science communities [7,8,9]. A such, it is evident that the temporal setting is becoming the cornerstone for ML in healthcare and medicine, with a significant potential for impact.Numerous novel methods have been developed to tackle medically-relevant tasks in the time domain, such as: prediction [10,11], causal inference [12,13,14], time-to-event analysis [15,16,17], clustering [18,19] 3 , as well as data imputation [20,21], and model interpretability [22,23] methods, among others. Yet currently a significant limitation exists in the lack of standardization of both data representation and model benchmarking [7,9]. TemporAI addresses these limitations as the first toolkit for development, prototyping and benchmarking of ML models on medically-relevant tasks with time series, static, and eventdata modalities.1 https://github.com/vanderschaarlab/temporai 2 Depending on the context, referred to alternatively as: temporal, longitudinal, or time series data. 3 In medical and other contexts, these tasks may also be referred to as, respectively: forecasting, (individualized) treatment effect estimation, survival analysis, phenotyping. The descriptor "temporal" may be used to contrast with the static task setting.

show abstract

“…In LSTM-based models, attention is computed over hidden representations across timesteps, which does not provide faithful token-level importance scores. Approaches that trace explanations back to individual timesteps [41] or input tokens [42] are only just emerging. Therefore, we analyze the raw attention weights for the LSTM-based model we consider below (see Section 4.2).…”

Section: Explanations From Attention Mechanismsmentioning

confidence: 99%

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Neely

Bleeker

et al. 2022

HHAI2022: Augmenting Human Intellect

View full text Add to dashboard Cite

There has been significant debate in the NLP community about whether or not attention weights can be used as an explanation – a mechanism for interpreting how important each input token is for a particular prediction. The validity of “attention as explanation” has so far been evaluated by computing the rank correlation between attention-based explanations and existing feature attribution explanations using LSTM-based models. In our work, we (i) compare the rank correlation between five more recent feature attribution methods and two attention-based methods, on two types of NLP tasks, and (ii) extend this analysis to also include transformer-based models. We find that attention-based explanations do not correlate strongly with any recent feature attribution methods, regardless of the model or task. Furthermore, we find that none of the tested explanations correlate strongly with one another for the transformer-based model, leading us to question the underlying assumption that we should measure the validity of attention-based explanations based on how well they correlate with existing feature attribution explanation methods. After conducting experiments on five datasets using two different models, we argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations. We suggest that researchers and practitioners should instead test various explanation methods and employ a human-in-the-loop process to determine if the explanations align with human intuition for the particular use case at hand.

show abstract

TimeSHAP: Explaining Recurrent Models through Sequence Perturbations

Cited by 45 publications

References 17 publications

Order in the Court: Explainable AI Methods Prone to Disagreement

Order in the Court: Explainable AI Methods Prone to Disagreement

TemporAI: Facilitating Machine Learning Innovation in Time Domain Tasks for Medicine

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Contact Info

Product

Resources

About