2013
DOI: 10.1186/1471-2105-14-s11-s6
|View full text |Cite
|
Sign up to set email alerts
|

Provenance in bioinformatics workflows

Abstract: In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facili… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…De Paula et al 36 proposed the management of data provenance for genome projects using PROV-DM. Their results demonstrated PROV-DM as a suitable model for storing properties for each execution of Bioinformatics workflows, and one which also provided graphical representation for the large volumes of data generated by genome projects using the Entity Collections.…”
Section: Related Workmentioning
confidence: 99%
“…De Paula et al 36 proposed the management of data provenance for genome projects using PROV-DM. Their results demonstrated PROV-DM as a suitable model for storing properties for each execution of Bioinformatics workflows, and one which also provided graphical representation for the large volumes of data generated by genome projects using the Entity Collections.…”
Section: Related Workmentioning
confidence: 99%
“…We argue that integrated web applications, involving scientific workflows and databases, can hide the complexity of underlying scientific software by abstracting away cumbersome aspects, such as managing files and setting command-line parameters, leading to increased productivity for scientists. One important aspect of enabling reproducible computational analyses is keeping track of the computational environment components, i.e., operating system, libraries, software packages and their respective versions (de Paula et al, 2013).…”
Section: /18mentioning
confidence: 99%
“…One emerging issue is keeping track of all of the tools, workflows, and parameters used in a bioinformatics project. To ensure data integrity, reproducibility and optimization of workflow performance, data provenance models, such as BioQ (Saccone et al 2012), SemPoD (Jayapandian et al 2012), and PROV-DM model (de Paula et al 2013), have been developed to track the origin of the biological data and the processes used to analyze it.…”
Section: Dataset Analysismentioning
confidence: 99%