Proceedings of the 4th Clinical Natural Language Processing Workshop 2022
DOI: 10.18653/v1/2022.clinicalnlp-1.7
|View full text |Cite
|
Sign up to set email alerts
|

What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

Abstract: Decision support systems based on clinical notes have the potential to improve patient care by pointing doctors towards overseen risks. Predicting a patient's outcome is an essential part of such systems, for which the use of deep neural networks has shown promising results. However, the patterns learned by these networks are mostly opaque and previous work revealed both reproduction of systemic biases and unexpected behavior for out-of-distribution patients. For application in clinical practice it is crucial … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…For this transfer, we apply an additional label-wise attention mechanism that further improves the interpretability of our method by highlighting the most relevant parts of a clinical note regarding a diagnosis. While deep neural models have been widely applied to outcome prediction tasks in the past (Shamout et al, 2020), their black-box nature remains a large obstacle for clinical application (van Aken et al, 2022). We argue that decision support is only possible when model predictions are accompanied by justifications that enable clinicians to follow a lead or to potentially discard predictions.…”
Section: Introductionmentioning
confidence: 93%
“…For this transfer, we apply an additional label-wise attention mechanism that further improves the interpretability of our method by highlighting the most relevant parts of a clinical note regarding a diagnosis. While deep neural models have been widely applied to outcome prediction tasks in the past (Shamout et al, 2020), their black-box nature remains a large obstacle for clinical application (van Aken et al, 2022). We argue that decision support is only possible when model predictions are accompanied by justifications that enable clinicians to follow a lead or to potentially discard predictions.…”
Section: Introductionmentioning
confidence: 93%
“…There are systematic studies of behavioral testing on classification tasks such as sentiment analysis (Ribeiro et al, 2020), hate speech detection (Röttger et al, 2021) and clinical outcome prediction (Van Aken et al, 2022). However, implementing behavioral testing in MT tasks poses significant challenges.…”
Section: 𝒓mentioning
confidence: 99%
“…With the high-quality test cases and their pseudoreferences crafted by BTPG, we can conduct behav-2 Our framework indeed involves a test set with references, but this is not a very strong requirement since there are many off-the-shelf test datasets and in particular this is a standard setting in behavioral testing. For instance, the pioneered research about behavioral testing (Ribeiro et al, 2020) and many follow-up works (Röttger et al, 2021;Van Aken et al, 2022) all use a test dataset with ground-truth labels.…”
Section: 𝒓mentioning
confidence: 99%
“…There is a famous saying in the AI community that “a model is as good as the data it is trained on,” resembling the popular saying “garbage in, garbage out.” In this regard, existing biases in the training data will be amplified by the model. van Aken et al trained several clinical NLP models using transfer-learning from different sources and found that the trained models were biased with respect to gender, ethnicity, and age 34 . For example, by letting a model know that a patient is of a specific race (compared with not mentioning their ethnicity), the model outputs a higher probability of drug abuse based on the same text describing another patient, although the data show that drug abuse is not different based on ethnicity 35 .…”
Section: Caveats and Biasesmentioning
confidence: 99%