2022
DOI: 10.48550/arxiv.2207.13243
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Abstract: The last decade of machine learning has seen drastic increases in scale and capabilities, and deep neural networks (DNNs) are increasingly being deployed across a wide range of domains. However, the inner workings of DNNs are generally difficult to understand, raising concerns about the safety of using these systems without a rigorous understanding of how they function. In this survey, we review literature on techniques for interpreting the inner components of DNNs, which we call inner interpretability methods… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 172 publications
0
1
0
Order By: Relevance
“…Mechanistic Interpretability. Mechanistic interpretation explains how LMs work by reverse engineering, i.e., reconstructing LMs with different components (Räuker et al, 2022). A recent line of work provides interpretation focusing on the LM's weights and intermediate representations (Olah et al, 2017(Olah et al, , 2018(Olah et al, , 2020.…”
Section: Related Workmentioning
confidence: 99%
“…Mechanistic Interpretability. Mechanistic interpretation explains how LMs work by reverse engineering, i.e., reconstructing LMs with different components (Räuker et al, 2022). A recent line of work provides interpretation focusing on the LM's weights and intermediate representations (Olah et al, 2017(Olah et al, , 2018(Olah et al, , 2020.…”
Section: Related Workmentioning
confidence: 99%