2016 IEEE High Performance Extreme Computing Conference (HPEC) 2016
DOI: 10.1109/hpec.2016.7761589
|View full text |Cite
|
Sign up to set email alerts
|

High-throughput ingest of data provenance records into Accumulo

Abstract: Whole-system data provenance provides deep insight into the processing of data on a system, including detecting data integrity attacks. The downside to systems that collect whole-system data provenance is the sheer volume of data that is generated under many heavy workloads. In order to make provenance metadata useful, it must be stored somewhere where it can be queried. This problem becomes even more challenging when considering a network of provenance-aware machines all collecting this metadata. In this pape… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 14 publications
0
10
0
Order By: Relevance
“…to identify the subset of images from a specific event or location. It is implemented using the D4M paradigm with the Accumulo NoSql backend for storage [13]- [15] with Accumulo considered the one of the highest performing databases and widely used for government applications [16]. This paradigm supports sparse storage, and easy integration with analysis tools in python and matlab/octave.…”
Section: E Data Indexing With Accumulomentioning
confidence: 99%
“…to identify the subset of images from a specific event or location. It is implemented using the D4M paradigm with the Accumulo NoSql backend for storage [13]- [15] with Accumulo considered the one of the highest performing databases and widely used for government applications [16]. This paradigm supports sparse storage, and easy integration with analysis tools in python and matlab/octave.…”
Section: E Data Indexing With Accumulomentioning
confidence: 99%
“…Further, Moyer et al evaluated the storage requirements of provenance when used for security purposes in relatively modest distributed systems [21]. In such a context, several thousands of graph elements can be generated per second and per machine, resulting in a graph containing billions of nodes to represent system execution over several months.…”
Section: Where Does the Audit Live?mentioning
confidence: 99%
“…One of the main hurdles of system-level provenance capture is the sheer amount of data generated [13,64]. One approach to this issue is to improve provenance ingest to storage [64] and provenance query [90]. A second approach is to reduce the amount of provenance data captured.…”
Section: Related Workmentioning
confidence: 99%