2021
DOI: 10.12688/f1000research.29032.1
|View full text |Cite
|
Sign up to set email alerts
|

Sustainable data analysis with Snakemake

Abstract: Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
617
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 567 publications
(618 citation statements)
references
References 35 publications
0
617
0
1
Order By: Relevance
“…All of the code underlying this daily lineage tracking web-report can be found at GitHub and Zenodo 12 . grinch is a python-based tool, the analysis pipeline of which is built on a snakemake backbone 13 . Every 24 hours a scheduled cron 14 task runs on our local servers.…”
Section: Methodsmentioning
confidence: 99%
“…All of the code underlying this daily lineage tracking web-report can be found at GitHub and Zenodo 12 . grinch is a python-based tool, the analysis pipeline of which is built on a snakemake backbone 13 . Every 24 hours a scheduled cron 14 task runs on our local servers.…”
Section: Methodsmentioning
confidence: 99%
“…Runtime and peak memory usage of RaDsex (version 1.1.2) was measured with the "benchmark" directive of snakemake (Mölder et al, 2021) on the Genotoul computational platform using four threads.…”
Section: Performance Measurementsmentioning
confidence: 99%
“…The computer code and input data necessary to reproduce all analyses described in this paper are available on GitHub at https://github .com/jbloom/SARS-CoV-2_PRJNA612766. This GitHub repository includes a Snakemake (Mölder et al 2021) pipeline that fully automates all steps in the analysis except for downloading of sequences from GISAID, which must be done manually as described in the GitHub repository's README in order to comply with GISAID data sharing terms.…”
Section: Code and Data Availabilitymentioning
confidence: 99%