Workflows have been traditionally a mean to describe and implement the computing experiments, usually parametric studies and explorations searching for the best solution, that scientific researchers want to perform. A workflow is not only the computing application, but a way of documenting a process. Science workflows may be of very different nature depending on the area of research, matching the actual experiment that the scientist want to perform. Workflow Management Systems are environments that offer the researchers tools to define, publish, execute and document their workflows.In some cases, the science workflows are used to generate data; in other cases are used to analyse existing data; only in a few cases, workflows are used both to generate and analyse data. The design of experiments is in some cases generated blindly, without a clear idea of which points are relevant to be computed/simulated, ending up with huge amount of computation that is performed following a brute-force strategy.However, the evolution of systems and the large amount of data generated by the applications require an in-situ analysis of the data, thus requiring new solutions to develop workflows that includes both the simulation/computational part and the analytic part. What is more, the fact that both components, computation and analytics, can be run together will enable the possibility of defining more dynamic workflows, with new computations being decided by the analytics in a more efficient way.The first part of the paper will review current approaches that a set of scientific communities follow in the development of their workflows. This paper does not intent to be exhaustive in the compilation of different approaches available to develop and deploy workflows. We focus on the Workflow Management Systems used by a set of scientific communities and their representative use cases, with the objective of understanding their different needs and requirements. The second part of the paper proposes a new software architecture to develop a new family of end-to-end workflows that enables the management of dynamic workflows composed of simulations, analytics and visualization, including inputs/outputs from streams.Keywords: workflows, scientific applications, Big Data.
IntroductionWorkflows appeared last century and have been used in the manufacturing industry as a mean to optimize their processes. Examples of traditional (non-IT) workflows can be found in the assembly lines, i.e., the Ford Model T assembly line standardized the production processes and was the first continuous delivery pipeline for the automotive industry. This process reduced the costs of manufacturing from $850 to $260 in 1924.The time and motion studies defined by Taylor [52] and Gilbreth [41] had significant impact in the manufacturing processes. These studies proposed to break manufacturing activities into small, simple steps, to determine with accuracy the amount of time required to perform each of the steps. Then, the sequence of movements taken by the employee has...