Naiad

Murray, Derek G.; McSherry, Frank; Isaacs, Rebecca; Isard, Michael; Barham, Paul; Abadi, Martı́n

doi:10.1145/2517349.2522738

Cited by 508 publications

(28 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…HPVM [45] extends the LLVM IR by introducing hierarchical dataflow graphs for mapping to accelerators, yet still lacks a high-level view and explicit state machines that SDFGs offer. Other representations include Bamboo [73], an object-oriented dataflow model that tracks state locally through data structure mutation over the course of the program; Dryad [39] and Naiad [52], parametric graphs intended for coarsegrained distributed data-parallel applications, where Naiad extends Dryad with definition of loops in a nested context; simplified data dependency graphs for optimization of GPU applications [70]; deterministic producer/consumer graphs [15]; and other combinations of task DAGs with data movement [32]. As the SDFG provides general-purpose state machines with dataflow, all the above models can be fully represented within it, where SDFGs have the added benefit of encapsulating fine-grained data dependencies.…”

Section: Related Workmentioning

confidence: 99%

“…The checks ensure that the array is indeed transient and not used in other instances of data access nodes. To avoid recomputing subsets (which may not be feasible to compute symbolically), if the transformation operates in strict mode, it only matches two arrays of the same shape (lines [51][52][53][54][55][56]. The transformation then operates in a straightforward manner, renaming the memlets to point to the second (not removed) array (lines 66-70) and redirecting dataflow edges to that data access node (lines 73-74).…”

Section: Polybench Flagsmentioning

confidence: 99%

See 1 more Smart Citation

Stateful dataflow multigraphs

Ben-Nun

Licht

Ziogas

et al. 2019

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

The ubiquity of accelerators in high-performance computing has driven programming complexity beyond the skill-set of the average domain scientist. To maintain performance portability in the future, it is imperative to decouple architecture-specific programming paradigms from the underlying scientific computations. We present the Stateful DataFlow multiGraph (SDFG), a data-centric intermediate representation that enables separating program definition from its optimization. By combining fine-grained data dependencies with high-level control-flow, SDFGs are both expressive and amenable to program transformations, such as tiling and double-buffering. These transformations are applied to the SDFG in an interactive process, using extensible pattern matching, graph rewriting, and a graphical user interface. We demonstrate SDFGs on CPUs, GPUs, and FPGAs over various motifs -from fundamental computational kernels to graph analytics. We show that SDFGs deliver competitive performance, allowing domain scientists to develop applications naturally and port them to approach peak hardware performance without modifying the original scientific code.HPC programmers have long sacrificed ease of programming and portability for achieving better performance. This mindset was established at a time when computer nodes had a single processor/core and were programmed with C/Fortran and MPI. The last decade, witnessing the end of Dennard scaling and Moore's law, brought a flurry of new technologies into the compute nodes. Those range from simple multi-core and manycore CPUs to heterogeneous GPUs and specialized FPGAs. To support those architectures, the complexity of OpenMP's specification grew by more than an order of magnitude from 63 pages in OpenMP 1.0 to 666 pages in OpenMP 5.0. This one example illustrates how (performance) programming complexity shifted from network scalability to node

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Polybench Flagsmentioning

confidence: 99%

Stateful dataflow multigraphs

Ben-Nun

Licht

Ziogas

et al. 2019

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

show abstract

“…Aurora [2], its distributed counterpart, Borealis [1] and STREAM [11] are some of the early prototypes of stream processing engines that make dynamic scheduling decisions. Many recent stream processing engines (NaiagraST [31], Nile [24], Naiad [33],Spark Streaming [41], Storm [8], S4 [35]) also scheduling decisions during runtime. All these systems either focus on single core or sharednothing architectures.…”

Section: Dynamic Solutionsmentioning

confidence: 99%

Scaling Ordered Stream Processing on Shared-Memory Multicores

Prasaad

Ramalingam

Rajan

2019

Proceedings of Real-Time Business Intelligence and Analytics

View full text Add to dashboard Cite

Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple opportunities for parallelizing its execution, in the form of data, pipeline and task parallelism. On the other hand, many important applications require that processing of the stream be ordered, where inputs are processed in the same order as they arrive. There is a fundamental conflict between ordered processing and parallelizing the streaming computation. This paper focuses on the problem of effectively parallelizing ordered streaming computations on a shared-memory multicore machine.We first address the key challenges in exploiting data parallelism in the ordered setting. We present a low-latency, non-blocking concurrent data structure to order outputs produced by concurrent workers on an operator. We also propose a new approach to parallelizing partitioned stateful operators that can handle load imbalance across partitions effectively and mostly avoid delays due to ordering. We illustrate the trade-offs and effectiveness of our concurrent data-structures on micro-benchmarks and streaming queries from the TPCx-BB [16] benchmark. We then present an adaptive runtime that dynamically maps the exposed parallelism in the computation to that of the machine. We propose several intuitive scheduling heuristics and compare them empirically on the TPCx-BB queries. We find that for streaming computations, heuristics that exploit as much pipeline parallelism as possible perform better than those that seek to exploit data parallelism.

show abstract

“…More recently there have been proposals for implementing DSPS on Cloud infrastructure such as Stormy [49], taking advantage of its elastic characteristics (i.e., easily adding and removing nodes from the system). Additionally, systems such as Naiad [50] combine DSPS with batch processing techniques, allowing complex incremental computations on streaming data.…”

Section: Distributed Stream Processing Systemsmentioning

confidence: 99%

Radiator - efficient message propagation in context-aware systems

Alves

Ferreira

2014

J Internet Serv Appl

View full text Add to dashboard Cite

Applications such as Facebook, Twitter and Foursquare have brought the mass adoption of personal short messages, distributed in (soft) real-time on the Internet to a large number of users. These messages are complemented with rich contextual information such as the identity, time and location of the person sending the message (e.g., Foursquare has millions of users sharing their location on a regular basis, with almost 1 million updates per day). Such contextual messages raise serious concerns in terms of scalability and delivery delay; this results not only from their huge number but also because the set of user recipients changes for each message (as their interests continuously change), preventing the use of well-known solutions such as pub-sub and multicast trees. This leads to the use of non-scalable broadcast based solutions or point-to-point messaging.We propose Radiator, a middleware to assist application programmers implementing efficient context propagation mechanisms within their applications. Based on each user's current context, Radiator continuously adapts each message propagation path and delivery delay, making an efficient use of network bandwidth, arguably the biggest bottleneck in the deployment of large-scale context propagation systems. Our experimental results demonstrate a 20x reduction on consumed bandwidth without affecting the real-time usefulness of the propagated messages.

show abstract

Naiad

Cited by 508 publications

References 34 publications

Stateful dataflow multigraphs

Stateful dataflow multigraphs

Scaling Ordered Stream Processing on Shared-Memory Multicores

Radiator - efficient message propagation in context-aware systems

Contact Info

Product

Resources

About