Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2119
|View full text |Cite
|
Sign up to set email alerts
|

Tackling the Story Ending Biases in The Story Cloze Test

Abstract: The Story Cloze Test (SCT) is a recent framework for evaluating story comprehension and script learning. There have been a variety of models tackling the SCT so far. Although the original goal behind the SCT was to require systems to perform deep language understanding and commonsense reasoning for successful narrative understanding, some recent models could perform significantly better than the initial baselines by leveraging human-authorship biases discovered in the SCT dataset. In order to shed some light o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
44
0

Year Published

2019
2019
2025
2025

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(51 citation statements)
references
References 15 publications
(25 reference statements)
0
44
0
Order By: Relevance
“…Recent studies have tried to create new NLI datasets that do not contain such artifacts, but many approaches to dealing with this issue remain unsatisfactory: constructing new datasets (Sharma et al, 2018) is costly and may still result in other artifacts; filtering "easy" examples and defining a harder subset is useful for evaluation purposes (Gururangan et al, 2018), but difficult to do on a large scale that enables training; and compiling adversarial examples (Glockner et al, 2018) is informative but again limited by scale or diversity. Instead, our goal is to develop methods that overcome these biases as datasets may still contain undesired artifacts despite annotation efforts.…”
Section: Introductionmentioning
confidence: 99%
“…Recent studies have tried to create new NLI datasets that do not contain such artifacts, but many approaches to dealing with this issue remain unsatisfactory: constructing new datasets (Sharma et al, 2018) is costly and may still result in other artifacts; filtering "easy" examples and defining a harder subset is useful for evaluation purposes (Gururangan et al, 2018), but difficult to do on a large scale that enables training; and compiling adversarial examples (Glockner et al, 2018) is informative but again limited by scale or diversity. Instead, our goal is to develop methods that overcome these biases as datasets may still contain undesired artifacts despite annotation efforts.…”
Section: Introductionmentioning
confidence: 99%
“…Recent research identified a key problem with the narrative cloze test, namely that language modeling approaches perform well without learning about events (Pichotta and Mooney, 2014;Rudinger et al, 2015). This drove the establishment of a new task: the story cloze test where the goal was to select the correct ending for a story given two endings (Mostafazadeh et al, 2016a;Sharma et al, 2018). Several works showed that incorporating event sequence information provides improvement in this task (Peng et al, 2017;Chaturvedi et al, 2017b).…”
Section: Narrative Understandingmentioning
confidence: 99%
“…Recent research identified a key problem with the narrative cloze test, namely that language modeling approaches perform well without learning about events (Pichotta and Mooney, 2014;Rudinger et al, 2015). This drove the establishment of a new task: the story cloze test where the goal was to select the correct ending for a story given two endings (Mostafazadeh et al, 2016a;Sharma et al, 2018). Several works showed that incorporating event sequence information provides improvement in this task (Peng et al, 2017;Chaturvedi et al, 2017b).…”
Section: Narrative Understandingmentioning
confidence: 99%
“…Generating an appropriate ending of a story was also studied by Guan et al (2018) and Sharma et al (2018). Research on generating stories from a sequence of images is anew Lukin et al, 2018;Kim et al, 2018;Hsu et al, 2018;Gonzalez-Rico and Fuentes-Pineda, 2018).…”
Section: Introductionmentioning
confidence: 99%