“…2 The actual hardware also has separate physical memories as shown in Figure 1.1: CPUs have their own DRAM and GPUs also have their own DRAM. The GPU's DRAM memory space is called global memory or device memory.…”
Section: Device and Global Memorymentioning
confidence: 99%
“…Since there are n 2 elements of C, the work W (n) = (n 3 ). 2 To compute D(n), observe that the n 2 output elements have no dependencies among them; the only dependencies occur during the reduction to compute each output element. Thus, D(n) = (log n).…”
Section: C(ij) B(:j)mentioning
confidence: 99%
“…with high probability [2]. One might have reasonably guessed an upper-bound of p · Q 1 (n; Z, L), i.e., that the algorithm could incur as many as p times more misses in the multithreaded case, but Equation (2.2) shows otherwise.…”
Section: Combined Analyses Of Parallelism and I/o-efficiencymentioning
confidence: 99%
“…The algorithm is complex and its details beyond the scope of this synthesis lecture; we instead refer the interested reader elsewhere for details [43,101]. 2 For simplicity, we assume the number of source and target points is the same, though in general they may differ. Moreover, the sets of points may in fact be exactly the same, with the sum excluding the computed target.…”
Section: ) Where φ(T) Is Called the Potential At Target Point T; δ(Smentioning
confidence: 99%
“…2 The roofline sets an upper bound on the performance of a kernel, depending on the kernel's operational intensity. When operational intensity as a column hits the roof, either it hits the flat part of the roof, which means performance is compute bound, or performance is ultimately memory bound.…”
“…2 The actual hardware also has separate physical memories as shown in Figure 1.1: CPUs have their own DRAM and GPUs also have their own DRAM. The GPU's DRAM memory space is called global memory or device memory.…”
Section: Device and Global Memorymentioning
confidence: 99%
“…Since there are n 2 elements of C, the work W (n) = (n 3 ). 2 To compute D(n), observe that the n 2 output elements have no dependencies among them; the only dependencies occur during the reduction to compute each output element. Thus, D(n) = (log n).…”
Section: C(ij) B(:j)mentioning
confidence: 99%
“…with high probability [2]. One might have reasonably guessed an upper-bound of p · Q 1 (n; Z, L), i.e., that the algorithm could incur as many as p times more misses in the multithreaded case, but Equation (2.2) shows otherwise.…”
Section: Combined Analyses Of Parallelism and I/o-efficiencymentioning
confidence: 99%
“…The algorithm is complex and its details beyond the scope of this synthesis lecture; we instead refer the interested reader elsewhere for details [43,101]. 2 For simplicity, we assume the number of source and target points is the same, though in general they may differ. Moreover, the sets of points may in fact be exactly the same, with the sum excluding the computed target.…”
Section: ) Where φ(T) Is Called the Potential At Target Point T; δ(Smentioning
confidence: 99%
“…2 The roofline sets an upper bound on the performance of a kernel, depending on the kernel's operational intensity. When operational intensity as a column hits the roof, either it hits the flat part of the roof, which means performance is compute bound, or performance is ultimately memory bound.…”
This paper proposes a methodology to study the data reuse quality of task-parallel runtimes. We introduce an extension to the reuse distance method called the Kernel Reuse Distance (KRD). The metric is a low-overhead alternative designed to analyze data reuse at the socket level while minimizing perturbation to the parallel schedule. Using the KRD metric we show that reuse depends considerably on the system configuration (sockets, cores) and on the runtime scheduler. Furthermore, we correlate KRD with hardware metrics such as cache misses and work time inflation. Overall we found that KRD can be used effectively to assess data reuse in parallel applications. The study also revealed that several current runtimes suffer from severe bottlenecks at scale which often dominate performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.