The data locality of work stealing

Acar, Umut A.; Blelloch, Guy E.; Blumofe, Robert D.

doi:10.1145/341800.341801

Cited by 145 publications

(92 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…2 The actual hardware also has separate physical memories as shown in Figure 1.1: CPUs have their own DRAM and GPUs also have their own DRAM. The GPU's DRAM memory space is called global memory or device memory.…”

Section: Device and Global Memorymentioning

confidence: 99%

“…Since there are n 2 elements of C, the work W (n) = (n 3 ). 2 To compute D(n), observe that the n 2 output elements have no dependencies among them; the only dependencies occur during the reduction to compute each output element. Thus, D(n) = (log n).…”

Section: C(ij) B(:j)mentioning

confidence: 99%

“…with high probability [2]. One might have reasonably guessed an upper-bound of p · Q 1 (n; Z, L), i.e., that the algorithm could incur as many as p times more misses in the multithreaded case, but Equation (2.2) shows otherwise.…”

Section: Combined Analyses Of Parallelism and I/o-efficiencymentioning

confidence: 99%

“…The algorithm is complex and its details beyond the scope of this synthesis lecture; we instead refer the interested reader elsewhere for details [43,101]. 2 For simplicity, we assume the number of source and target points is the same, though in general they may differ. Moreover, the sets of points may in fact be exactly the same, with the sum excluding the computed target.…”

Section: ) Where φ(T) Is Called the Potential At Target Point T; δ(Smentioning

confidence: 99%

“…2 The roofline sets an upper bound on the performance of a kernel, depending on the kernel's operational intensity. When operational intensity as a column hits the roof, either it hits the flat part of the roof, which means performance is compute bound, or performance is ultimately memory bound.…”

Section: Roofline Modelmentioning

confidence: 99%

See 4 more Smart Citations

High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities

Abts¹,

Kim²

2011

Synthesis Lectures on Computer Architecture

View full text Add to dashboard Cite

Section: Device and Global Memorymentioning

confidence: 99%

Section: C(ij) B(:j)mentioning

confidence: 99%

Section: Combined Analyses Of Parallelism and I/o-efficiencymentioning

confidence: 99%

Section: ) Where φ(T) Is Called the Potential At Target Point T; δ(Smentioning

confidence: 99%

Section: Roofline Modelmentioning

confidence: 99%

See 3 more Smart Citations

High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities

Abts¹,

Kim²

2011

Synthesis Lectures on Computer Architecture

View full text Add to dashboard Cite

Evaluation of Work Stealing Algorithms

Numpaque

Cardozo

2022

Communications in Computer and Information Science

View full text Add to dashboard Cite

Analysis of Data Reuse in Task-Parallel Runtimes

Pericàs

Amer

Taura

et al. 2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This paper proposes a methodology to study the data reuse quality of task-parallel runtimes. We introduce an extension to the reuse distance method called the Kernel Reuse Distance (KRD). The metric is a low-overhead alternative designed to analyze data reuse at the socket level while minimizing perturbation to the parallel schedule. Using the KRD metric we show that reuse depends considerably on the system configuration (sockets, cores) and on the runtime scheduler. Furthermore, we correlate KRD with hardware metrics such as cache misses and work time inflation. Overall we found that KRD can be used effectively to assess data reuse in parallel applications. The study also revealed that several current runtimes suffer from severe bottlenecks at scale which often dominate performance.

show abstract

The data locality of work stealing

Cited by 145 publications

References 23 publications

High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities

High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities

Evaluation of Work Stealing Algorithms

Analysis of Data Reuse in Task-Parallel Runtimes

Contact Info

Product

Resources

About