2020
DOI: 10.48550/arxiv.2010.12016
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards falsifiable interpretability research

Matthew L. Leavitt,
Ari Morcos

Abstract: Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples. For instance, methods aim to visualize the components of an input which are "important" to a network's decision, or to measure the semantic properties of single neurons. Here, we argue that interpretability research suffers from an over-reliance on intuition-based approaches that risk-and in some cases have caus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(21 citation statements)
references
References 63 publications
1
20
0
Order By: Relevance
“…This lack of consensus is worrying, as measures are often designed according to different and incompatible intuitive desiderata, such as whether finding a one-to-one assignment, or finding few-to-one mappings, between neurons is more appropriate [17]. As a community, we need well-chosen formal criteria for evaluating metrics to avoid over-reliance on intuition and the pitfalls of too many researcher degrees of freedom [14].…”
Section: Introductionmentioning
confidence: 99%
“…This lack of consensus is worrying, as measures are often designed according to different and incompatible intuitive desiderata, such as whether finding a one-to-one assignment, or finding few-to-one mappings, between neurons is more appropriate [17]. As a community, we need well-chosen formal criteria for evaluating metrics to avoid over-reliance on intuition and the pitfalls of too many researcher degrees of freedom [14].…”
Section: Introductionmentioning
confidence: 99%
“…Taken together, our empirical results show that the widely used visualization method by Olah et al ( 2017) is more limited in its ability to convey causal understanding of CNN activations than previously assumed. This reinforces the importance of testing falsifiable hypotheses in the field of interpretable artificial intelligence (Leavitt & Morcos, 2020). Feature visualizations certainly have an important place within the fields of interpretability and explainability, and their importance is likely to grow further with increased societal applications of machine learning.…”
Section: Discussionmentioning
confidence: 63%
“…Explanation methods such as feature visualizations have been criticized as intuition-driven (Leavitt & Morcos, 2020), and it is unclear whether they allow humans to gain a precise understanding of which image features "cause" high activation in a unit. Here, we propose an objective psychophysical task to quantify how well these synthetic images support causal understanding of CNN units.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Development of methods for relating complex later-layer factors to well-understood early layer factors is an important priority for further interpretability work in complex domains. Finally, we note that although these interpretations of the above factors bear out in the majority of the randomly selected positions shown in the online database of factors, an interpretation can only be considered definitive once it has been quantitatively validated [91], ideally by intervening on the input.…”
Section: Resultsmentioning
confidence: 98%