2021
DOI: 10.1371/journal.pcbi.1008770
|View full text |Cite
|
Sign up to set email alerts
|

Principles for data analysis workflows

Abstract: A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately c… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 26 publications
(24 citation statements)
references
References 46 publications
0
18
0
Order By: Relevance
“…Our study findings highlight the ways in which data scientists function as designers and co-designers of analyses when working alongside their clients. We noticed that many of the conversations and behaviors they described run parallel to the kinds of design processes that have been studied by ethnographers of design [8,9]; some data science practitioners have also discussed these parallels in research papers, blog posts, and podcasts [45][46][47]57]. For example, our participants reported needing to reconcile many competing interests -including the client's ambitions for a project, the limitations of available data sources, and constraints about the time and resources available to work on an analysis.…”
Section: Data Scientists As Designersmentioning
confidence: 99%
“…Our study findings highlight the ways in which data scientists function as designers and co-designers of analyses when working alongside their clients. We noticed that many of the conversations and behaviors they described run parallel to the kinds of design processes that have been studied by ethnographers of design [8,9]; some data science practitioners have also discussed these parallels in research papers, blog posts, and podcasts [45][46][47]57]. For example, our participants reported needing to reconcile many competing interests -including the client's ambitions for a project, the limitations of available data sources, and constraints about the time and resources available to work on an analysis.…”
Section: Data Scientists As Designersmentioning
confidence: 99%
“…That being said, the conversation on software development for scientific research has shifted from “best” practices [ 28 ] to “good enough” practices [ 29 ]. Open-source scientific software is a collaborative endeavor requiring unique demands on researchers, and, therefore, standards should be adopted according to their appropriateness for your research community [ 30 ]. An individual or team of researchers should not strive to follow all best practices of software development, but rather strive to improve over time.…”
Section: Discussionmentioning
confidence: 99%
“…Open hydrologists explicitly provide public access (e.g., through a link accessible on the journal publication site) to (1) raw data and associated metadata (including specifications of the devices used to collect data), (2) descriptions and citations for the analysis methods and software versions used, (3) workflows, code, and software developed to collect and analyze data, (4) descriptions of quality controls used when processing raw data, (5) final processed data, and (6) descriptive methods used to integrate data into other processing tools. The level of detail necessary to ensure openness can differ wildly between studies, but the workflow for data-intensive research should be clear and reproducible (Stoudt et al, 2021). When data sources, processing, and accessibility are complex, additional descriptions in an Appendix or Supplement may be appropriate upon publication of hydrologic research.…”
Section: Practical Guide To Open Data Collection and Analysismentioning
confidence: 99%