2021
DOI: 10.48550/arxiv.2109.04922
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

Abstract: As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 31 publications
0
1
0
Order By: Relevance
“…Entailment is a closely related concept to Next Sentence Prediction and has received research stimulation due to revolutionary advances in natural language inference (NLI) models [22]- [24]. These models were trained to detect entailment as classification task, and have been used to infer entailment as an indication of coherence in diverse systems [19], [25], [26]. Entailment is a stricter concept, where the goal is to predict whether a sentence pair (premise and hypothesis) is contradictory, in agreement, or neutral.…”
Section: Introductionmentioning
confidence: 99%
“…Entailment is a closely related concept to Next Sentence Prediction and has received research stimulation due to revolutionary advances in natural language inference (NLI) models [22]- [24]. These models were trained to detect entailment as classification task, and have been used to infer entailment as an indication of coherence in diverse systems [19], [25], [26]. Entailment is a stricter concept, where the goal is to predict whether a sentence pair (premise and hypothesis) is contradictory, in agreement, or neutral.…”
Section: Introductionmentioning
confidence: 99%