2019 IEEE 35th International Conference on Data Engineering (ICDE) 2019
DOI: 10.1109/icde.2019.00179
|View full text |Cite
|
Sign up to set email alerts
|

Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization

Abstract: Along with the prosperous data science activities, the importance of provenance during data science project lifecycle is recognized and discussed in recent data science systems research. Increasingly modern data science platforms today have nonintrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to repr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 54 publications
0
5
0
Order By: Relevance
“…Data science helps to process and supports the derivation of valuable data [ 59 ]. Figure 8 illustrates the data science lifecycle [ 12 , 60 ]. The lifecycle is based on the seven iterative steps such as data cleaning, data exploration, and data mining, etc.…”
Section: Next-generation Advancements In the Internet Of Things (Iot)...mentioning
confidence: 99%
“…Data science helps to process and supports the derivation of valuable data [ 59 ]. Figure 8 illustrates the data science lifecycle [ 12 , 60 ]. The lifecycle is based on the seven iterative steps such as data cleaning, data exploration, and data mining, etc.…”
Section: Next-generation Advancements In the Internet Of Things (Iot)...mentioning
confidence: 99%
“…The issues of inter-connectedness and size of provenance graphs have similarly emerged in different domains, wherein techniques such as user views, segmentation, and aggregation have been explored to transform the graphs to usable or interpretable ones [9,17,37,38]. We adopt a similar approach but we leverage the semantics of production-ML operators and connections between them.…”
Section: Model Graphletsmentioning
confidence: 99%
“…Previous work has even led to the standardization of provenance representations for workflows in the form of graphs [26,39,40]. Other research has proposed various ways to explore and analyze such provenance graphs, e.g., visualization [14], reachability query support [13], support for user-defined views [17], segmentation and summarization [9,10,37]. Our work introduces a framework to segment ML provenance graphs and demonstrates how this segmentation leads to further analysis and optimizations for ML pipelines.…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, given information about the place data originated from, how they come in their present states, and who or what acted on them helps users to establish trust in the data. Provenance can show resources and relations that have affected the construction of the output data and are commonly expressed as directed graphs (digraphs) [17]. The primary aim of the W3C standardized provenance is to enable the extensive publication and exchange of provenance over the web [18].…”
Section: Conceptual View Of Data Provenancementioning
confidence: 99%