Explaining Explanations: An Overview of Interpretability of Machine Learning

Gilpin, Leilani H.; Bau, David; Yuan, Ben Z.; Bajwa, Ayesha; Specter, Michael; Kagal, Lalana

doi:10.48550/arxiv.1806.00069

Cited by 55 publications

(89 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Measuring interpretation desiderata. Currently, there is no clear consensus in the community around how to evaluate interpretation methods, although some recent work has begun to address it (11)(12)(13). As a result, the standard of evaluation varies considerably across different work, making it challenging both for researchers in the field to measure progress, and for prospective users to select suitable methods.…”

Section: Future Workmentioning

confidence: 99%

“…One line of work focuses on providing an overview of different interpretation methods with a strong emphasis on post hoc interpretations of deep learning models (7,8), sometimes pointing out similarities between various methods (9,10). Other work has focused on the narrower problem of how interpretations should be evaluated (11,12) and what properties they should satisfy (13). These previous works touch on different subsets of interpretability, but do not address interpretable machine learning as a whole, and give limited guidance on how interpretability can actually be used in data-science life cycles.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Interpretable machine learning: definitions, methods, and applications

Murdoch,

Singh,

Kumbier

et al. 2019

Preprint

110

View full text Add to dashboard Cite

Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them.We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods. * For clarity, throughout the paper we use the term to refer to both machine-learning models and algorithms.accuracy, and relevancy, where relevancy is judged by a human audience. Using these terms, we categorize a broad range of existing methods, all grounded in real-world examples † . In doing so, we provide a common vocabulary for researchers and practitioners to use in evaluating and selecting interpretation methods. We then show how our work enables a clearer discussion of open problems for future research.A. Defining interpretable machine learning. On its own, interpretability is a broad, poorly defined concept. Taken to its full generality, to interpret data means to extract information (of some form) from it. The set of methods falling under this umbrella spans everything from designing an initial experiment to visualizing final results. In this overly general form, interpretability is not substantially different from the established concepts of data science and applied statistics.Instead of general interpretability, we focus on the use of interpretations in the context of ML as part of the larger datascience life cycle. We define interpretable machine learning as the use of machine-learning models for the extraction of relevant knowledge about domain relationships contained in data. Here, we view knowledge as being relevant if it pr...

show abstract

Section: Future Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Interpretable machine learning: definitions, methods, and applications

Murdoch,

Singh,

Kumbier

et al. 2019

Preprint

110

View full text Add to dashboard Cite

show abstract

“…Several researchers have proposed comprehensive definitions of explanations [22,23,7,24] and have presented explanation components that they deem necessary to satisfy either their work or the domains where they hope the explanations will be useful. However, with a shift of focus in AI we feel the need to revisit the work on defining explanation as we consider what is desirable in next-generation "explainable knowledge-enabled systems."…”

Section: Terminologymentioning

confidence: 99%

“…To begin to address the need of building explainable, knowledge-enabled AI systems, we present a list of desirable properties from the synthesis of our literature review of past explanation work. Our review primarily spans knowledge representation in expert systems [22], provenance and reasoning efforts in the Semantic Web [18], user task-processing workflows in cognitive assistants [7,35], and efforts to reduce unintelligibility in the ML domain [9,24,21]. Additionally, we analyzed explanation requirements from current literature, answering an increased need for usercomprehensibility [36], accountability [32] and user-focus [19].…”

Section: Explainable Knowledge-enabled Systemsmentioning

confidence: 99%

“…Additionally, we utilize Gilpin et al's definition of interpretability as a "science of comprehending what a model did." However, if the models used in the system are not interpretable, we propose that they should consider including proxy methods to be interpretable, for example, utilizing linear proxy models proposed by Gilpin et al [24] that serve as a simplified proxy of the full model. • Support provenance: Paraphrasing from the explanation requirements suggested by Hasan and Gandon [37], we agree that explainable knowledge-enabled systems should store the provenance of the information that their models rely on beyond just metadata.…”

Section: Explainable Knowledge-enabled Systemsmentioning

confidence: 99%

See 1 more Smart Citation

Foundations of Explainable Knowledge-Enabled Systems

Chari,

Gruen,

Seneviratne

et al. 2020

Preprint

View full text Add to dashboard Cite

Explainability has been an important goal since the early days of Artificial Intelligence. Several approaches for producing explanations have been developed. However, many of these approaches were tightly coupled with the capabilities of the artificial intelligence systems at the time. With the proliferation of AI-enabled systems in sometimes critical settings, there is a need for them to be explainable to end-users and decision-makers. We present a historical overview of explainable artificial intelligence systems, with a focus on knowledge-enabled systems, spanning the expert systems, cognitive assistants, semantic applications, and machine learning domains. Additionally, borrowing from the strengths of past approaches and identifying gaps needed to make explanations user-and contextfocused, we propose new definitions for explanations and explainable knowledgeenabled systems.

show abstract

Explainable artificial intelligence and social science: Further insights for qualitative investigation

2022

View full text Add to dashboard Cite

We present a scoping review of user studies in explainable artificial intelligence (XAI) entailing qualitative investigation. We draw on social science corpora to suggest ways for improving the rigor of studies where XAI researchers use observations, interviews, focus groups, and/or questionnaire tasks to collect qualitative data. We contextualize the presentation of the XAI papers included in our review according to the components of rigor discussed in the qualitative research literature: (a) underlying theories or frameworks; (b) methodological approaches; (c) data collection methods; and (d) data analysis processes. The results of our review dovetail with calls made by others in the XAI community advocating for collaboration with experts from social disciplines toward bolstering rigor and effectiveness in user studies.

show abstract

Explaining Explanations: An Overview of Interpretability of Machine Learning

Cited by 55 publications

References 0 publications

Interpretable machine learning: definitions, methods, and applications

Interpretable machine learning: definitions, methods, and applications

Foundations of Explainable Knowledge-Enabled Systems

Explainable artificial intelligence and social science: Further insights for qualitative investigation

Contact Info

Product

Resources

About