2019
DOI: 10.1109/access.2019.2903727
|View full text |Cite
|
Sign up to set email alerts
|

Why-Diff: Exploiting Provenance to Understand Outcome Differences From Non-Identical Reproduced Workflows

Abstract: Data analytics processes such as scientific workflows tend to be executed repeatedly, with varying dependencies and input datasets. The case has been made in the past for tracking the provenance of the final information products through the workflow steps, to enable their reproducibility. In this paper, we explore the hypothesis that provenance traces recorded during execution are also instrumental to answering questions about the observed differences between sets of results obtained from similar but not ident… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…They also provide detailed characteristics about the respective provenance approaches. 2009-2011 6 [10], [29], [30], [31], [32], [33] 2012-2014 13 [1], [2], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44] 2015-2017 9 [4], [45], [46], [47], [48], [49], [50], [51], [52] 2018-2020 21 [9], [11], [53], [54], [55] [70], [71] Target domain (Bio-) medical or healthcare domain 36 [1], [2], [4], [9], [10], [11], [28], [29], [31], [34], [36], [37], [38], [39], [40], [42], [43], [44], [46], [47], [48], [49], [51]…”
Section: Literature Searchmentioning
confidence: 99%
“…They also provide detailed characteristics about the respective provenance approaches. 2009-2011 6 [10], [29], [30], [31], [32], [33] 2012-2014 13 [1], [2], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44] 2015-2017 9 [4], [45], [46], [47], [48], [49], [50], [51], [52] 2018-2020 21 [9], [11], [53], [54], [55] [70], [71] Target domain (Bio-) medical or healthcare domain 36 [1], [2], [4], [9], [10], [11], [28], [29], [31], [34], [36], [37], [38], [39], [40], [42], [43], [44], [46], [47], [48], [49], [51]…”
Section: Literature Searchmentioning
confidence: 99%
“…Provenance collection is common in workflow systems where it is built directly into the execution environment, such as in Kepler (Altintas et al, 2006), VisTrails (Koop et al, 2013), and Taverna (Missier et al, 2008). Of particular interest is the work of de Oliveira et al (2014) who use provenance to debug long-running workflows, and Why-Diff (Thavasimani et al, 2019) which compares provenance of multiple workflow executions to find differences. Provenance collection in programming languages is much less common, with the exception of the noWorkflow (Murta et al, 2014) implementation for Python.…”
Section: Related Workmentioning
confidence: 99%
“…Execution provenance helps to understand the results of reproduced workflows. Extracting and understanding the provenance for tracking the changes in the reproduced workflows is widely discussed and demonstrated in the papers [23] and [24].…”
Section: Scientific Reproducibilitymentioning
confidence: 99%