Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.29
|View full text |Cite
|
Sign up to set email alerts
|

A Systematic Review of Reproducibility Research in Natural Language Processing

Abstract: Against the background of what has been termed a reproducibility crisis in science, the NLP field is becoming increasingly interested in, and conscientious about, the reproducibility of its results. The past few years have seen an impressive range of new initiatives, events and active research in the area. However, the field is far from reaching a consensus about how reproducibility should be defined, measured and addressed, with diversity of views currently increasing rather than converging. With this focused… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
57
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 77 publications
(58 citation statements)
references
References 35 publications
0
57
0
1
Order By: Relevance
“…Our contributions are threefold. We first complete a reproduction of state-of-the-art cross-topic stance detection work (Reimers et al, 2019), as reproduction has repeatedly shown to be relevant for NLP (Fokkens et al, 2013;Cohen et al, 2018;Belz et al, 2021). The reproduction is largely successful: we obtain similar numeric results.…”
Section: Introductionmentioning
confidence: 91%
See 1 more Smart Citation
“…Our contributions are threefold. We first complete a reproduction of state-of-the-art cross-topic stance detection work (Reimers et al, 2019), as reproduction has repeatedly shown to be relevant for NLP (Fokkens et al, 2013;Cohen et al, 2018;Belz et al, 2021). The reproduction is largely successful: we obtain similar numeric results.…”
Section: Introductionmentioning
confidence: 91%
“…We adopt the definition of reproduction by Belz et al (2021): repeating the experiments as described in the earlier study, with the exact same data and software. We analyze our reproduced results according to the three dimensions of repro-duction proposed by Cohen et al ( 2018): whether we find either the same or different (1) (numeric) values, (2) findings, and (3) conclusions as the earlier study.…”
Section: Generalization To New Topicsmentioning
confidence: 99%
“…In addition, raw annotations can shed light on the difficulty of the task and nature of the data: they can be aggregated in multiple ways (Oortwijn et al, 2021), or used to account for annotator bias in model training (Beigman and Beigman Klebanov, 2009). Finally, releasing annotated judgments makes it possible to replicate and further analyze the evaluation outcome (Belz et al, 2021).…”
Section: Releasing Annotationsmentioning
confidence: 99%
“…For ST, the lack of detail and clarity in describing evaluation protocols makes it difficult to improve them, as has been pointed out for other NLG tasks by Shimorina and Belz (2021) who propose evaluation datasheets for clear documentation of human evaluations, Lee (2020) and van der who propose best practices guidelines, and Belz et al ( , 2021 who raise concerns regarding reproducibility. This issue is particularly salient for ST tasks where stylistic changes are defined implicitly by data (Jin et al, 2021) and where the instructions given to human judges for style transfer might be the only explicit characterization of the style dimension targeted.…”
Section: Standardizing Evaluation Protocolsmentioning
confidence: 99%
See 1 more Smart Citation