Reducing control overhead in dataflow architectures

Petersen, Andrew; Putnam, Andrew; Mercaldi, Martha; Schwerin, Andrew; Eggers, Susan J.; Swanson, Steve; Oskin, Mark

doi:10.1145/1152154.1152184

Cited by 7 publications

(1 citation statement)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In multi-threading the cost of thread creation, termination and switching determines the efficiency of the parallel thread model. This particular field has sought to reduce these costs by combinations of compiler techniques and runtime representations [10], [11] together with hardware support [12], [13]. Closer to home, the implementation of Cilk-5 redesigns its runtime in order to incur the least amount of overhead at execution time.…”

Section: Introduction and Related Workmentioning

confidence: 99%

A Study of Speculative Distributed Scheduling on the Cell/B.E.

Bellens

Perez

Badía

et al. 2011

2011 IEEE International Parallel &Amp; Distributed Processing Symposium

View full text Add to dashboard Cite

Star Superscalar's (StarSs) programming model converts a sequential application in C or Fortran into an efficient parallel program. The resulting parallel code is highly dynamic in the sense that data analysis and task scheduling occur at run-time, while the application executes. In this paper we compare this approach to the strategy adopted by other multicore programming environments. The prize to pay for dynamic scheduling and dependence tracking is higher runtime overhead. We propose a distributed scheduler for Task Dependence Graphs (TDGs) to attenuate the scheduling cost in heterogeneous multicore architectures. This scheduler allows the cores to speculatively select tasks from a conservative estimate of the TDG. In case of conflicts or lack of tasks a lightweight centralized scheduler services the faulting core after which the latter resumes its participation in the distributed scheme. Experiments with Cell Superscalar (CellSs) on a representative set of benchmarks demonstrate the reduction in runtime overhead achieved by the distributed scheduler. This reduction in runtime overhead carries over directly to a performance improvement for a large fraction of the benchmarks.

show abstract

Section: Introduction and Related Workmentioning

confidence: 99%