Trade-Offs in Automatic Provenance Capture

Stamatogiannakis, Manolis; Kazmi, Hasanat; Sharif, Hashim; Vermeulen, Remco; Gehani, Ashish; Bos, Herbert; Groth, Paul

doi:10.1007/978-3-319-40593-3_3

Cited by 10 publications

(12 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Our implementation enables the UDFs to specify what items in an object or a group were “sub-selected,” while also capturing the relationship to the broader object or group. In contrast to SubZero’s [28] or to event logging [22], [22], [27], our model captures equivalences among computations (including equivalences that hold for particular datatypes and UDFs). PROVision’s query optimizer exploits these to “trace” provenance and aid in troubleshooting.…”

Section: Prior Workmentioning

confidence: 99%

Fine-Grained Provenance for Matching & ETL

Zheng

Alawini

2019

2019 IEEE 35th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

Data provenance tools capture the steps used to produce analyses. However, scientists must choose among work-flow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; provenance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks – for data types such as strings, images, etc. Scientists need new capabilities to identify the sources of errors, find why different code versions produce different results, and identify which parameter values affect output. We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content within data objects. PROVision extends database-style provenance techniques to capture equivalences, support optimizations, and enable selective evaluation. We formalize our extensions, implement them in the PROVision system, and validate their effectiveness and scalability for common ETL and matching tasks.

show abstract

Section: Prior Workmentioning

confidence: 99%

Fine-Grained Provenance for Matching & ETL

Zheng

Alawini

2019

2019 IEEE 35th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

show abstract

“…Associating them to results can improve both fine-tuning and data analyses at runtime. Despite the several solutions available for making applications provenance-aware [5][6][7], capturing provenance data in CSE applications is still an open issue. The challenges are mainly related to performance and provenance granularity.…”

Section: Introductionmentioning

confidence: 99%

“…The challenges are mainly related to performance and provenance granularity. Stamatogiannakis et al [5] evaluated tradeoffs in provenance capture mechanisms. They consider that solutions that are easy to deploy collect provenance in a very fine grain and present a significant overhead, while solutions that are based on function calls present low overhead and granularity is controlled by the code instrumentation.…”

Section: Introductionmentioning

confidence: 99%

Capturing Provenance for Runtime Data Analysis in Computational Science and Engineering Applications

Silva

Souza

Camata

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Capturing provenance data for runtime analysis has several challenges in high performance computational science engineering applications. The main issues are avoiding significant overhead in data capture, loading and runtime query support; and coupling provenance capture mechanisms with applications built with highly efficient numerical libraries, and visualization frameworks targeted to high performance environments. This work presents DfA-prov, an approach to capture provenance data and domain data aiming at high performance applications.

show abstract

“…na análise e escolha dos hiperparâmetros para o treinamento. Existem diversas abordagens de captura de dados de proveniência[Stamatogiannakis et al 2016]. Abordagens de captura automática de dados possuem granularidade muito fina, gerando um overhead significativo na execução de scripts, principalmente os de larga escala.…”

unclassified

Análise de Hiperparâmetros em Aplicações de Aprendizado Profundo por meio de Dados de Proveniência

Pina

Neves²,

Paes

et al. 2019

Anais Do XXXIV Simpósio Brasileiro De Banco De Dados (SBBD 2019)

View full text Add to dashboard Cite

O treinamento das Redes Neurais Convolucionais (CNN) requer o ajuste de hiperparâmetros. As soluções existentes para auxiliar a escolha das melhores combinações de hiperparâmetros definem uma representação própria para modelar os relacionamentos de derivação dos dados. Essa representação proprietária dificulta a análise de dados e a interoperabilidade. Este artigo propõe a CNNProv, que adota o padrão W3C PROV para representar relacionamentos de derivação de dados para facilitar a análise das combinações de hiperparâmetros, contribuindo assim para a fase de treinamento das CNNs. A CNNProv captura dados de proveniência e permite a análise de valores de hiperparâmetros durante a execução. Os experimentos mostram a adequação do W3C PROV para a análise de hiperparâmetros e contribui para a qualidade e confiabilidade dos resultados de CNN, com overhead desprezível de até, no máximo, 4%.

show abstract

Trade-Offs in Automatic Provenance Capture

Cited by 10 publications

References 16 publications

Fine-Grained Provenance for Matching & ETL

Fine-Grained Provenance for Matching & ETL

Capturing Provenance for Runtime Data Analysis in Computational Science and Engineering Applications

Análise de Hiperparâmetros em Aplicações de Aprendizado Profundo por meio de Dados de Proveniência

Contact Info

Product

Resources

About