2021
DOI: 10.1145/3458723
|View full text |Cite
|
Sign up to set email alerts
|

Datasheets for datasets

Abstract: Documentation to facilitate communication between dataset creators and consumers.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
525
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 888 publications
(641 citation statements)
references
References 5 publications
3
525
0
2
Order By: Relevance
“…Indeed, better provenance of the process by which data is generated will be critical in order to disentangle the source of dataset differences (for example, if clinical practices or environmental and social factors are giving rise to different healthcare measures and outcomes). Following guidelines developed for documenting datasets (72) and models (73) in the machine learning community, similar guidelines should be established for models in healthcare as well (10). An example is the proposal for reporting subgroup-level performances in MI-CLAIM checklist (74).…”
Section: Discussionmentioning
confidence: 99%
“…Indeed, better provenance of the process by which data is generated will be critical in order to disentangle the source of dataset differences (for example, if clinical practices or environmental and social factors are giving rise to different healthcare measures and outcomes). Following guidelines developed for documenting datasets (72) and models (73) in the machine learning community, similar guidelines should be established for models in healthcare as well (10). An example is the proposal for reporting subgroup-level performances in MI-CLAIM checklist (74).…”
Section: Discussionmentioning
confidence: 99%
“…In line with these observations, Gebru et al. [ 10 ] propose the use of datasheets for datasets. They suggest that every dataset should be accompanied by a datasheet that documents its motivation, composition, collection process, recommended uses, and other important aspects, with the ultimate goal of increasing transparency and accountability within the community, mitigating unwanted biases in ML systems, and encouraging reproducibility of ML experiments.…”
Section: Challenges and Opportunities For Research Parasites In The Building Of Fair ML Systemsmentioning
confidence: 95%
“…When creating a new dataset or challenge, it is advisable to document the dataset with its characteristics, and thus possible model limitations. Possibilities include data sheets [Gebru et al, 2018], which describe the data collection procedure, and model cards [Mitchell et al, 2019], which describe the choices made to train a model (including the data).…”
Section: Let Us Build Awareness Of Data Limitationsmentioning
confidence: 99%