Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.414
|View full text |Cite
|
Sign up to set email alerts
|

‘Just What do You Think You’re Doing, Dave?’ A Checklist for Responsible Data Use in NLP

Abstract: A key part of the NLP ethics movement is responsible use of data, but exactly what that means or how it can be best achieved remain unclear. This position paper discusses the core legal and ethical principles for collection and sharing of textual data, and the tensions between them.We propose a potential checklist for responsible data (re-)use that could both standardise the peer review of conference submissions, as well as enable a more in-depth view of published research across the community. Our proposal ai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(22 citation statements)
references
References 40 publications
0
21
0
1
Order By: Relevance
“…Although major computational linguistics venues started to require statements about legal and ethical aspects of data collection and sharing, not all the venues require such statements. It is important to be aware of the existing guidelines, such as ACM code of ethics (Gotterbarn et al, 2018), or the guidelines adapted by major CL conferences, 14 as well as the recent discussion in the field (e.g., Šuster et al, 2017;Rogers et al, 2021). Even though the common guidelines may not fit every task, or every legal jurisdiction, being aware of potential issues, and being explicit about the legal and ethical considerations during data collection and annotation is important.…”
Section: Discussionmentioning
confidence: 99%
“…Although major computational linguistics venues started to require statements about legal and ethical aspects of data collection and sharing, not all the venues require such statements. It is important to be aware of the existing guidelines, such as ACM code of ethics (Gotterbarn et al, 2018), or the guidelines adapted by major CL conferences, 14 as well as the recent discussion in the field (e.g., Šuster et al, 2017;Rogers et al, 2021). Even though the common guidelines may not fit every task, or every legal jurisdiction, being aware of potential issues, and being explicit about the legal and ethical considerations during data collection and annotation is important.…”
Section: Discussionmentioning
confidence: 99%
“…Note that "legal" here refers only to formal law. This distinguishes my scope from (no less important) work on "ethical", "fair" or "responsible" AI [22,13,24]. Despite clear overlaps, neither is a subset of the other.…”
Section: When Are Datasets Legal?mentioning
confidence: 95%
“…To statisticians, better typically means unbiased, though "bias" is used differently from in the bias-variance tradeoff [8], or in algorithmic bias [7]. The growing "responsible AI" literature emphasizes that datasets are better when they are ethically and fairly sourced [22,13,24]. This paper underscores legality as one desideratum for "better".…”
Section: Introductionmentioning
confidence: 93%
See 2 more Smart Citations