2013 IEEE International Conference on Cluster Computing (CLUSTER) 2013
DOI: 10.1109/cluster.2013.6702685
|View full text |Cite
|
Sign up to set email alerts
|

Distributed data provenance for large-scale data-intensive computing

Abstract: Abstract-It has become increasingly important to capture and understand the origins and derivation of data (its provenance). A key issue in evaluating the feasibility of data provenance is its performance, overheads, and scalability. In this paper, we explore the feasibility of a general metadata storage and management layer for parallel file systems, in which metadata includes both file operations and provenance metadata. We experimentally investigate the design optimality-whether provenance metadata should b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
4
3
3

Relationship

4
6

Authors

Journals

citations
Cited by 41 publications
(20 citation statements)
references
References 28 publications
0
20
0
Order By: Relevance
“…By decoupling metadata and data, we are able to apply flexible strategies on metadata management and data I/Os. Prior work [23,24] also shows that a distributed hash table offers a flexible yet efficient means for tracking applications' provenance.…”
Section: Discussionmentioning
confidence: 99%
“…By decoupling metadata and data, we are able to apply flexible strategies on metadata management and data I/Os. Prior work [23,24] also shows that a distributed hash table offers a flexible yet efficient means for tracking applications' provenance.…”
Section: Discussionmentioning
confidence: 99%
“…More recently, Zhao et al [48] proposed using both a distributed hash table (FusionFS [49]) and a centralized database (SPADE [47]) to manage the metadata. Similarly to us, their metadata model includes both file operations and provenance information.…”
Section: Related Workmentioning
confidence: 99%
“…We believe HyCache+, together with other features such as data compression [52] and data provenance [37,49], would make the next generation extreme-scale storage system (e.g. [47]) more practical for real applications [30].…”
Section: F Broader Impactmentioning
confidence: 99%