2011
DOI: 10.14778/3402755.3402766
|View full text |Cite
|
Sign up to set email alerts
|

An algebraic approach for data-centric scientific workflows

Abstract: Scientific workflows have emerged as a basic abstraction for structuring and executing scientific experiments in computational environments. In many situations, these workflows are computationally and data intensive, thus requiring execution in large-scale parallel computers. However, parallelization of scientific workflows remains low-level, ad-hoc and laborintensive, which makes it hard to exploit optimization opportunities. To address this problem, we propose an algebraic approach (inspired by relational al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
49
0
22

Year Published

2013
2013
2021
2021

Publication Types

Select...
3
2
1

Relationship

4
2

Authors

Journals

citations
Cited by 77 publications
(72 citation statements)
references
References 12 publications
0
49
0
22
Order By: Relevance
“…A SWf W (A, D) is the abstract representation of a directed acyclic graph (DAG) of computational activities A and their data dependencies D. There is a dependency between two activities if one consumes the data produced by the other. An activity is a description of a piece of work and can be a computational script (computational activity), some data (data activity) or some set-oriented algebraic operator like map or filter [11]. The parents of an activity are all activities directly connected to its inputs.…”
Section: Cache Managementmentioning
confidence: 99%
“…A SWf W (A, D) is the abstract representation of a directed acyclic graph (DAG) of computational activities A and their data dependencies D. There is a dependency between two activities if one consumes the data produced by the other. An activity is a description of a piece of work and can be a computational script (computational activity), some data (data activity) or some set-oriented algebraic operator like map or filter [11]. The parents of an activity are all activities directly connected to its inputs.…”
Section: Cache Managementmentioning
confidence: 99%
“…Os SGWs necessitam de linguagens de especificação de workflows, as quais podem ser: (i) gráficas (normalmente associadas a grafos ou redes de Petri); (ii) baseadas em XML (XPDL etc); ou (iii) baseadas linguagens de especificação próprias plain-text [Deelman et al, 2009]. Em uma visão mais datacêntrica, um workflowé composto de atividades onde cada uma delasé um componente de software capaz de executar programas considerando parâmetros de entrada e de saída [Ogasawara et al, 2011]. Tais parâmetros são usados para definir as dependências entre as atividades de um workflow.…”
Section: Workflows Datacêntricosunclassified
“…Desta forma, mecanismos que otimizem a produção e consumo de resultados intermediários e otimize quais atividades devem ser executadas antes e quais delas devem ser executadas de modo agrupado (pipeline) passam a ser muito relevantes. As abordagens algébricas [Jergler et al, 2015;Rheinländer et al, 2015;Ogasawara et al, 2011] procuram otimizar a execução dos workflows, levando-se em consideração o contexto de execução, os dados com os seus respectivos metadados e as informações de proveniência. Salloum et al [2016] descrevem os cenários atuais de BigData e as funcionalidades mais relevantes do Spark que proveem desempenho no processamento e gerência de dados no ecossistema Hadoop, incluindo integração com o sistema de arquivos HDFS.…”
Section: Workflows Datacêntricosunclassified
See 2 more Smart Citations