2022 ACM Conference on Fairness, Accountability, and Transparency 2022
DOI: 10.1145/3531146.3533240
|View full text |Cite
|
Sign up to set email alerts
|

The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(12 citation statements)
references
References 16 publications
0
12
0
Order By: Relevance
“…SliceFinder is one method that uses metadata to find slices with significantly high loss [15]. Often, there is not enough metadata to define slices with high error, so another family of methods uses model embeddings and clustering to find groups with high error [18,19]. Lastly, there are approaches that use end-user reports or crowd feedback to discover model failures or interesting behaviors [2,8,43].…”
Section: Model Evaluation Approachesmentioning
confidence: 99%
“…SliceFinder is one method that uses metadata to find slices with significantly high loss [15]. Often, there is not enough metadata to define slices with high error, so another family of methods uses model embeddings and clustering to find groups with high error [18,19]. Lastly, there are approaches that use end-user reports or crowd feedback to discover model failures or interesting behaviors [2,8,43].…”
Section: Model Evaluation Approachesmentioning
confidence: 99%
“…In addition to detecting spurious correlations, explainable AI techniques can also reveal human-subliminal signals about the disease and give us a better chance of addressing challenges posed by the pandemic. Lastly, slice discovery methods 67,68 , such as Domino 69 , can identify and describe semantically meaningful subsets of data on which the model performs poorly, thereby revealing spurious correlations.…”
Section: Model Building and Evaluationmentioning
confidence: 99%
“…Prominent examples include facial recognition models failing to recognize women with dark skin tones [14] or translation models perpetuating gender stereotypes [118]. However, subgroups where a model fails can be highly contextual, specific, and may not match any social category (i.e., "men wearing thin framed glasses" [18] or "busy/cluttered workspace" [30]). It remains an open challenge for ML practitioners to detect which specific use case scenarios are likely to fail out of a possibly infinite space of model inputs -and prioritize which failures have the greatest potential for harm [7,43].…”
Section: Introductionmentioning
confidence: 99%