2020
DOI: 10.1016/j.patter.2020.100136
|View full text |Cite
|
Sign up to set email alerts
|

Dataset Reuse: Toward Translating Principles to Practice

Abstract: Highlights d A compilation of reusability features of datasets from literature d A corpus of 1.47 million datasets from 65,537 repositories source from GitHub d A case study on GitHub using a five-step approach to understand projected data reuse d A machine learning model that helps to predict dataset reuse

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

3
6

Authors

Journals

citations
Cited by 25 publications
(21 citation statements)
references
References 88 publications
0
17
0
Order By: Relevance
“…This importance will only grow as researchers increasingly reuse data. 48 However, the impact of data handling on computational models is, as of yet, a poorly studied area.…”
Section: Discussionmentioning
confidence: 99%
“…This importance will only grow as researchers increasingly reuse data. 48 However, the impact of data handling on computational models is, as of yet, a poorly studied area.…”
Section: Discussionmentioning
confidence: 99%
“…Given the proximity of such users to actual databases, GitHub is a rich source for heterogeneous tables. Prior analyses of CSV files from GitHub also found that these files have diverse formatting and the tables extracted from them have relatively large dimensions [35,19]. These properties are common across database contexts [35,20], so that we consider CSV files from GitHub a suitable resource for database-like tables (C2).…”
Section: Design Principles Of Gittablesmentioning
confidence: 99%
“…With the increasing digitisation of research processes, there has been a significant call for the wider adoption of interoperable sharing of data and its associated metadata. We refer to [72] for a comprehensive overview and recommendations, in particular for data; notably that review highlights the wide variety of metadata and documentation that the literature prescribes for enabling data reuse. Likewise, we suggest [82] that covers the importance of metadata standards in reproducible computational research.…”
Section: Related Workmentioning
confidence: 99%