Representing Interoperable Provenance Descriptions for ETL Workflows

Freitas, André; Kämpgen, Benedikt; Oliveira, João Gabriel; Ó’Riain, Seán; Curry, Edward

doi:10.1007/978-3-662-46641-4_4

Cited by 5 publications

(5 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Provenance tracking for web data has been concretely explored by Freitas et al (2011). In this study, Prov4J (Freitas et al, 2010) -a framework for constructing a provenance system for web data -utilizing semantic web technologies has been presented. Prov4J utilizes resource definition framework (RDF) to represent provenance information and URIs (called ProvURIs) to associate the information resource in an application with its provenance descriptor stored in a provenance repository.…”

Section: Related Workmentioning

confidence: 99%

“…Provenance is not only used for describing the origin of result data, but also for explaining the process of data aggregation, assessing the quality of data and examining the execution of webbased data access. Several studies in this context (Hartig, 2009;Freitas et al, 2010; utilize provenance data models that are compatible with the community standard -the Open Provenance Model (OPM) (Moreau et al, 2011). However, because of flexibility in representing data and the support of interoperability in various provenance systems (Eckert et al, 2014), W3C PROV (Gil & Miles, 2016) -a new worldwide provenance standard -is utilized as a data model for our provenance solution.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Provenance in Web Feed Mash-Up Systems

Sansrimahachai

2016

International Journal of Information Technology and Web Engineering

View full text Add to dashboard Cite

The recent emergence of web 2.0 technologies and rich internet applications is driving the development of a new class of applications that combines data from diverse sources which we refer to as “mash-ups.” One of the most popular mash-ups comes in the form of web feed mash-ups relying on syndication technologies such as RSS and Atom. This kind of mash-ups aggregates web feeds derived from multiple news websites or blogs and then timely presents them in a single interface. In such systems, it is difficult to know exactly how feed results in data mash-ups are generated. In particular, it is difficult for users to make determinations about whether information is trusted. Therefore, it is necessary that web feed mash-ups have to support a mechanism that is capable of recording and querying provenance information - the information about the process that led to result data. In this paper, the author proposes a provenance tracking solution that enables provenance functionality to be facilitated in web feed mash-ups. He demonstrates how the provenance of feed mash-up results to be determined by means of a provenance query algorithm. To tackle the storage problem resulting from the persistence of intermediate web feeds, a novel storage optimization method is introduced. Finally, the author evaluates his provenance solution in terms of storage consumption for provenance collection, demonstrating significant reductions in storage size and achieving reasonable storage overheads.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Provenance in Web Feed Mash-Up Systems

Sansrimahachai

2016

International Journal of Information Technology and Web Engineering

View full text Add to dashboard Cite

show abstract

“…Community efforts towards the convergence into a common provenance model led to the Open Provenance Model (OPM) 4 . OPM descriptions allow interoperability on the level of workflow structure.…”

Section: Provenance Modelmentioning

confidence: 99%

Capturing Interactive Data Transformation Operations Using Provenance Workflows

Omitola

Freitas

Curry

et al. 2015

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. The ready availability of data is leading to the increased opportunity of their re-use for new applications and for analyses. Most of these data are not necessarily in the format users want, are usually heterogeneous, and highly dynamic, and this necessitates data transformation efforts to re-purpose them. Interactive data transformation (IDT) tools are becoming easily available to lower these barriers to data transformation efforts. This paper describes a principled way to capture data lineage of interactive data transformation processes. We provide a formal model of IDT, its mapping to a provenance representation, and its implementation and validation on Google Refine. Provision of the data transformation process sequences allows assessment of data quality and ensures portability between IDT and other data transformation platforms. The proposed model showed a high level of coverage against a set of requirements used for evaluating systems that provide provenance management solutions.

show abstract

“…The authors used data integration toolkits as a workflow framework to support their LOD publishing process and provenance gathering facilities. Later, [13] and [20] investigated complementary perspectives of such approach. The first author described a vocabulary focused on modeling data transformation workflows.…”

Section: Provenance Initiativesmentioning

confidence: 99%

“…In order to represent the published linked provenance data, the Data Preparation and Transformation process adopts the semantic approach proposed by [13]. This approach was used in the provenance management architecture presented by [23] as described at the end of section 3.…”

Section: Data Preparation and Transformation Process Provenancementioning

confidence: 99%

Lop

Mendonça

Cruz

Cerda

et al. 2013

Proceedings of the Fifth Workshop on Semantic Web Information Management

View full text Add to dashboard Cite

The Web of Data has emerged as a means to expose, share, reuse, and connect information on the Web identified by URIs using RDF as a data model, following Linked Data Principles. However, the reuse of third party data can be compromised without proper data quality assessments. In this context, important questions emerge: how can one trust on published data and links? Which manipulation, modification and integration operations have been applied to the data before its publication? What is the nature of comparisons or transformations applied to data during the interlinking process? In this scenario, provenance becomes a fundamental element. In this paper, we describe an approach for generating and capturing Linked Open Provenance (LOP) to support data quality and trustworthiness assessments, which covers preparation and format transformation of traditional data sources, up to dataset publication and interlinking. The proposed architecture takes advantage of provenance agents, orchestrated by an ETL workflow approach, to collect provenance at any specified level and also link it with its corresponding data. We also describe a real use case scenario where the architecture was implemented to evaluate the proposal.

show abstract

Representing Interoperable Provenance Descriptions for ETL Workflows

Cited by 5 publications

References 12 publications

Provenance in Web Feed Mash-Up Systems

Provenance in Web Feed Mash-Up Systems

Capturing Interactive Data Transformation Operations Using Provenance Workflows

Lop

Contact Info

Product

Resources

About