2023
DOI: 10.1111/jcal.12784
|View full text |Cite
|
Sign up to set email alerts
|

The use of annotations to explain labels: Comparing results from a human‐rater approach to a deep learning approach

Abstract: Background Deep learning methods, where models do not use explicit features and instead rely on implicit features estimated during model training, suffer from an explainability problem. In text classification, saliency maps that reflect the importance of words in prediction are one approach toward explainability. However, little is known about whether the salient words agree with those identified by humans as important. Objectives The current study examines in‐line annotations from human annotators and salienc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…Although in a very different application context, Lottridge et al (2023) followed a similar purpose and methodological setup, which led to similar results to those of Gombert et al (2023) regarding the lack of correspondence between what machines and humans identified at the output level as the determining factors in the classification of responses. The purpose of the built classifier is to identify students at risk—due to anxiety, abuse, suicidal tendencies, and the like—who mention their critical situation in their text responses when participating in a large‐scale assessment.…”
Section: Paper Contributionsmentioning
confidence: 73%
See 2 more Smart Citations
“…Although in a very different application context, Lottridge et al (2023) followed a similar purpose and methodological setup, which led to similar results to those of Gombert et al (2023) regarding the lack of correspondence between what machines and humans identified at the output level as the determining factors in the classification of responses. The purpose of the built classifier is to identify students at risk—due to anxiety, abuse, suicidal tendencies, and the like—who mention their critical situation in their text responses when participating in a large‐scale assessment.…”
Section: Paper Contributionsmentioning
confidence: 73%
“…Similar to the study by Lottridge et al (2023), the study by Andersen et al (2023) was situated in large-scale assessments where large amounts of human resources are devoted to manually coding constructed responses. To keep the human in the loop and in control, Andersen et al (2023) developed a method that dynamically varies the amount of automatically scored short text responses, depending on the performance of the automatic system.…”
Section: Text Datamentioning
confidence: 99%
See 1 more Smart Citation
“…El Zini et al employed LIME, Anchors, and SHAP when introducing a new dataset to evaluate the performances of different models for sentiment analysis [136]. Lottridge et al compared annotations from humans with respect to explanations provided by both LIME and Integrated Gradients within the scope of crisis alert identification [137]. Arashpour et al compared a wide range of explainability methods falling into the classes of perturbation-based methods and gradient-based methods (Integrated Gradients, Gradient SHAP, Occlusion, the Fast Gradient Sign Method, Projected Gradient Descent, Minimal Perturbation, and Feature Ablation) for waste categorization in images [138].…”
Section: Perturbation-based Methodsmentioning
confidence: 99%
“…Several automated scoring engines have been used in operations for scoring short‐constructed responses (e.g., Braun et al., 2006; Heilman & Madnani, 2015), long essays (e.g., Attali & Burstein, 2006), and mathematical equations (Fife, 2017). Recently, the transformer architecture underlying most LLMs has shown excellent performance in automated scoring of short‐constructed responses, essays, and speech (Lottridge et al., 2022a; Lottridge et al., 2022b; Lottridge et al., 2022c). In fact, the winners of the 2021 NAEP automated scoring competition all used the transformer architecture (https://github.com/NAEP‐AS‐Challenge/reading‐prediction/blob/main/results.md).…”
Section: Impacts and Implications For Assessmentmentioning
confidence: 99%