2017
DOI: 10.1371/journal.pone.0181142
|View full text |Cite
|
Sign up to set email alerts
|

"What is relevant in a text document?": An interpretable machine learning approach

Abstract: Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text’s category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate tha… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
208
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 231 publications
(209 citation statements)
references
References 25 publications
1
208
0
Order By: Relevance
“…For instance, the authors of [44,46] showed that the winning method of the PASCAL VOC competition [23] was often not detecting the object of interest, but was utilizing correlations or context in the data to correctly classify an image. It recognized boats by the presence of water and trains by the presence of rails in the image, moreover, it recognized horses by the presence of a copyright watermark 6 . The occurrence of the copyright tags in horse images is a clear artifact in the dataset, which had gone unnoticed to the organizers and participants of the challenge for many years.…”
Section: Explanations Help To Find "Clever Hans" Predictorsmentioning
confidence: 99%
See 2 more Smart Citations
“…For instance, the authors of [44,46] showed that the winning method of the PASCAL VOC competition [23] was often not detecting the object of interest, but was utilizing correlations or context in the data to correctly classify an image. It recognized boats by the presence of water and trains by the presence of rails in the image, moreover, it recognized horses by the presence of a copyright watermark 6 . The occurrence of the copyright tags in horse images is a clear artifact in the dataset, which had gone unnoticed to the organizers and participants of the challenge for many years.…”
Section: Explanations Help To Find "Clever Hans" Predictorsmentioning
confidence: 99%
“…A popular measure for heatmap quality is based on perturbation analysis [9,75,6]. The assumption of this evaluation metric is that the perturbation of relevant (according to the heatmap) input variables should lead to a steeper decline of the prediction score than the perturbation of input dimensions which are of lesser importance.…”
Section: Evaluating Quality Of Explanationsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, quantitative evaluations are needed for more robust comparisons. Such evaluations have included measuring the impact of the deletion of words identified by the explanation approaches on the classification output (Arras et al, 2016(Arras et al, , 2017, or testing whether the explanation was consistent with an underlying gold model (Ribeiro et al, 2016). These automatic evaluations are fast to carry out but act as a simplistic proxy for explanation quality.…”
Section: Related Workmentioning
confidence: 99%
“…Interpretable machine learning (ML) models, where the end user can understand how a decision was reached, are a critical requirement for the wide adoption of ML solutions in many fields such as healthcare, finance, and law Alvarez-Melis and Jaakkola, 2017;Arras et al, 2017;Gilpin et al, 2018;Biran and Cotton, 2017) For complex natural language processing (NLP) such as question answering (QA), human readable explanations of the inference process have been proposed as a way to interpret QA models . To which organ system do the esophagus, liver, pancreas, small intestine, and colon belong?…”
Section: Introductionmentioning
confidence: 99%