2019
DOI: 10.1371/journal.pbio.3000125
|View full text |Cite
|
Sign up to set email alerts
|

Developing a modern data workflow for regularly updated data

Abstract: Over the past decade, biology has undergone a data revolution in how researchers collect data and the amount of data being collected. An emerging challenge that has received limited attention in biology is managing, working with, and providing access to data under continual active collection. Regularly updated data present unique challenges in quality assurance and control, data publication, archiving, and reproducibility. We developed a workflow for a long-term ecological study that addresses many of the chal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
38
0
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 38 publications
(39 citation statements)
references
References 33 publications
0
38
0
1
Order By: Relevance
“…In addition, open data products can increase efficiency of the individual researcher and a collective research team by encouraging collaborators to adopt an open science workflow. Many tools developed within the software and computer science community to facilitate open process and the creation of open data are now easily accessible to environmental scientists ( Yenni et al, 2019 ). Version control software (e.g., Git, GitHub), open source programming languages (e.g., R, Python), and integrated development environments (IDEs, e.g., RStudio, Spyder) can all be leveraged to dynamically create and share open data products that can build institutional memory.…”
Section: Survey Methodology and Objectivesmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, open data products can increase efficiency of the individual researcher and a collective research team by encouraging collaborators to adopt an open science workflow. Many tools developed within the software and computer science community to facilitate open process and the creation of open data are now easily accessible to environmental scientists ( Yenni et al, 2019 ). Version control software (e.g., Git, GitHub), open source programming languages (e.g., R, Python), and integrated development environments (IDEs, e.g., RStudio, Spyder) can all be leveraged to dynamically create and share open data products that can build institutional memory.…”
Section: Survey Methodology and Objectivesmentioning
confidence: 99%
“…The synthesized data product can be used by the research team to create interactive applications for stakeholders to share and explore the data and is also fully integrated into summary reports using software for generating dynamic documents (e.g., using , Xie, 2015 , , Allaire et al, 2018 , Jupyter notebooks, Kluyver et al, 2016 ). Continuous integration services can automate quality control and regularly update data products as new information is collected ( Yenni et al, 2019 ). The data product also becomes available on an open data repository that is discoverable by other researchers and can contribute to alternative scientific advances beyond the immediate goals (e.g., Hydroshare for the hydrologic sciences, Idaszak et al, 2017 ).…”
Section: Survey Methodology and Objectivesmentioning
confidence: 99%
“…How to appropriately manage the update and versioning of these datasets in a research environment is still an open issue. [21][22][23]…”
Section: Introductionmentioning
confidence: 99%
“…Historically, a model that has been adopted in workflow analysis is the strictly empirical model, in which mathematical methods with discrete variables such as Petri Nets and Milner [21][22][23], or other methods, analyze information with qualitative characteristics [11,18,24,25] and extract indicators [18,24,26,27] that can serve as a reference for a team of managers or professionals responsible for analysis of workflows or research in various fields of work. Other workflow approximations or causality-based concepts have a much more labor-intensive mathematical model [28][29][30][31][32][33] and although useful for some purposes, they may be costly for firms looking for more streamlined process of workflow analysis solutions, but without spending a lot of time required to complete the analyses. ese methods, in some cases, are very useful for analyzing agents and operational procedures from a previous theoretical model or an inductive/intuitive model based on empirical evidence [5,7].…”
Section: Introductionmentioning
confidence: 99%
“…ese methods, in some cases, are very useful for analyzing agents and operational procedures from a previous theoretical model or an inductive/intuitive model based on empirical evidence [5,7]. However, it does not add to these models concepts that discriminate the adaptability between organisms and objects [1,3,16,[28][29][30] and the predictability of the phenomena [2,16,30] occurring in workflows in several professional areas, due to the presence of continuous variables [1,3,6,10] in the system that are not tracked by traditional approaches. e traditional flowcharts operate by a very large margin of information's compensation not presented in the flow itself, leading the manager and the agents of a firm to implement methods of discussion of the problem, situational methodological analysis, and other structures that are subtracted from the workflow itself.…”
Section: Introductionmentioning
confidence: 99%