Dataflow applications, such as machine learning algorithms, can run for days, making it desirable to have assurances that they will work correctly. Current tools are not good enough: too often the interactions between tasks are not type-safe, leading to undesirable runtime errors. This paper presents a new declarative Haskell Embedded DSL (eDSL) for dataflow programming: CircuitFlow. Defined as a Symmetric Monoidal Preorder (SMP) on data that models dependencies in the workflow, it has a strong mathematical basis, refocusing on how data flows through an application, resulting in a more expressive solution that not only catches errors statically, but also achieves competitive run-time performance. In our preliminary evaluation, CircuitFlow outperforms the industry-leading Luigi library of Spotify by scaling better with the number of inputs. The innovative creation of CircuitFlow is also of note, exemplifying how to create a modular eDSL whose semantics necessitates effects, and where storing complex type information for program correctness is paramount.
Dataflow applications, such as machine learning algorithms, can run for days, making it desirable to have assurances that they will work correctly. Current tools are not good enough: too often the interactions between tasks are not type-safe, leading to undesirable runtime errors. This paper presents a new declarative Haskell Embedded DSL (eDSL) for dataflow programming: CircuitFlow. Defined as a Symmetric Monoidal Preorder (SMP) on data that models dependencies in the workflow, it has a strong mathematical basis, refocusing on how data flows through an application, resulting in a more expressive solution that not only catches errors statically, but also achieves competitive run-time performance. In our preliminary evaluation, CircuitFlow outperforms the industry-leading Luigi library of Spotify by scaling better with the number of inputs. The innovative creation of CircuitFlow is also of note, exemplifying how to create a modular eDSL whose semantics necessitates effects, and where storing complex type information for program correctness is paramount.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.