2018
DOI: 10.1007/s41060-018-0107-0
|View full text |Cite
|
Sign up to set email alerts
|

The many faces of data-centric workflow optimization: a survey

Abstract: Workflow technology is rapidly evolving and, rather than being limited to modeling the control flow in business processes, is becoming a key mechanism to perform advanced data management, such as big data analytics. This survey focuses on data-centric workflows (or workflows for data analytics or data flows), where a key aspect is data passing through and getting manipulated by a sequence of steps. The large volume and variety of data, the complexity of operations performed, and the long time such workflows ta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 37 publications
(18 citation statements)
references
References 109 publications
(232 reference statements)
0
18
0
Order By: Relevance
“…Much work in the fields of AI and machine learning has been conducted over the past few years to automate each of these steps. For data cleaning, Kougka et al provide a survey of different techniques that can be used to automate the data cleaning process (also known as ETL or Extract/Transform/Load) [42]. Feature engineering is often considered to be the most time consuming step of the data science process [33], and techniques such as Deep Feature Synthesis [35] and One Button Machine [43] automate the task of generating ML-ready features from multi-relational databases.…”
Section: What Is Autoai?mentioning
confidence: 99%
“…Much work in the fields of AI and machine learning has been conducted over the past few years to automate each of these steps. For data cleaning, Kougka et al provide a survey of different techniques that can be used to automate the data cleaning process (also known as ETL or Extract/Transform/Load) [42]. Feature engineering is often considered to be the most time consuming step of the data science process [33], and techniques such as Deep Feature Synthesis [35] and One Button Machine [43] automate the task of generating ML-ready features from multi-relational databases.…”
Section: What Is Autoai?mentioning
confidence: 99%
“…Liew et al [22] have recently analyzed selected workflow management systems (WMSs) that are widely used by the scientific community, namely: Airavata [24], Kepler [20], KNIME [6], Meandre [23], Pegasus [11], Taverna [35] and Swift [36]. Such systems have been analyzed with respect to a framework aiming at capturing the major facets characterizing WMSs: (a) processing elements, i.e., the building blocks of workflows envisaged to be either web services or executable programs; (b) coordination method, i.e., the mechanism controlling the execution of the workflow elements envisaged to be either orchestration or choreography; (c) workflow representation, i.e., the specification of a work-flow that can meet two goals human representation and/or computer communication; (d) data processing model, i.e., the mechanism through which the processing elements process the data that can be bulk data or stream data; (e) optimization stage, i.e., when optimization of the workflow (if any) is expected to take place that can either be build time or runtime (e.g., data workflow processing optimization [19].…”
Section: Related Workmentioning
confidence: 99%
“…In ETL, there are also a variety of approaches that seek to reduce the amount of manual labour required. For example, this includes the provision of language features [4,32] or patterns [31,33] that support recurring data preparation behaviours, techniques for managing evolution of ETL programs [8], and development of ETL processes that abstract over more concrete implementation details [3,22]. However, although such work focuses on raising the abstraction levels at which data engineers engage in data preparation tasks, we are not aware of prior results that use feedback on data products to make changes across complete data preparation processes.…”
Section: Related Workmentioning
confidence: 99%