2021
DOI: 10.1007/s13222-021-00387-7
|View full text |Cite
|
Sign up to set email alerts
|

Collecting and visualizing data lineage of Spark jobs

Abstract: Metadata management constitutes a key prerequisite for enterprises as they engage in data analytics and governance. Today, however, the context of data is often only manually documented by subject matter experts, and lacks completeness and reliability due to the complex nature of data pipelines. Thus, collecting data lineage—describing the origin, structure, and dependencies of data—in an automated fashion increases quality of provided metadata and reduces manual effort, making it critical for the development … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 15 publications
0
1
0
Order By: Relevance
“…Collecting data is the stage of data that has been collected. The data used is sourced from kaggle.com, which is opensource (Schoenenwald et al, 2021).…”
Section: Collecting Datamentioning
confidence: 99%
“…Collecting data is the stage of data that has been collected. The data used is sourced from kaggle.com, which is opensource (Schoenenwald et al, 2021).…”
Section: Collecting Datamentioning
confidence: 99%