2015
DOI: 10.1145/2742351
|View full text |Cite
|
Sign up to set email alerts
|

Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution Rates

Abstract: Next-generation HPC computing platforms are likely to be characterized by significant, unpredictable nonuniformities in execution time among compute nodes and cores. The resulting load imbalances from this nonuniformity are expected to arise from a variety of sources-manufacturing discrepancies, dynamic power management, runtime component failure, OS jitter, software-mediated resiliency, and TLB/-cache performance variations, for example. It is well understood that existing algorithms with frequent points of b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 50 publications
0
6
0
Order By: Relevance
“…In [14], the authors introduced a variation-aware algorithm that improves application performance under a power constraint by determining module-level (individual processor and associated DRAM) power allocation, with up to 5.4× speedup. The authors of [12] proposed parallel algorithms that tolerate the variability and the non-uniformity by decoupling per process communication over the available CPU. Acun et al [2] found out a way to reduce the energy variation on Ivy Bridge and Sandy Bridge processors, by disabling the Turbo Boost feature to stabilize the execution time over a set of processors.…”
Section: Related Workmentioning
confidence: 99%
“…In [14], the authors introduced a variation-aware algorithm that improves application performance under a power constraint by determining module-level (individual processor and associated DRAM) power allocation, with up to 5.4× speedup. The authors of [12] proposed parallel algorithms that tolerate the variability and the non-uniformity by decoupling per process communication over the available CPU. Acun et al [2] found out a way to reduce the energy variation on Ivy Bridge and Sandy Bridge processors, by disabling the Turbo Boost feature to stabilize the execution time over a set of processors.…”
Section: Related Workmentioning
confidence: 99%
“…Jitter in HPC is typically described as performance loss caused by the competition for resources between background processes and applications, or application interference [37,17]. Some OS jitter researchers have simulated these effects at scale [9], while others have proposed applications [12] or systems [4] that can account for these types of variability. Additionally, schedulers are often identified as causes of significant variability in HPC systems [32] and might be classified as a form of jitter.…”
Section: Related Workmentioning
confidence: 99%
“…Such variability is cited as a significant barrier to exascale computing [29,18]. Unfortunately, variability (which includes OS jitter [21]) is both ubiquitous and elusive as its causes pervade and obscure performance across the systems stack from hardware [14,23] to middleware [20,1] to applications [12] to extreme-scale systems [29,35]. Additionally, as a number of recent reports attest [29,18], performance variability at scale can significantly reduce performance and energy efficiency [9,4].…”
mentioning
confidence: 99%
“…They demonstrated that by refactoring explicit stencil calculations, they were able to achieve speedups of 1 to 37 times faster than a traditional noise sensitive algorithm in an uncoordinated timing environment. 20 Beyond HPC, the impact of timing to more general scenarios in concurrency is also interesting. Google's Spanner uses coordinated time to achieve scalability and guarantee correctness of its distributed concurrency features such as externally consistent transactions, lock-free read-only transactions, and non-blocking reads.…”
Section: How Precise Is Enough? the Impact Of Time Synchronization Tomentioning
confidence: 99%
“…In perhaps the largest impact reported, Hammouda et al studied a classic bulk synchronous implementation of an explicit stencil calculation in a range of environments spanning the absence of random detours to increasingly larger presence of random detours. They demonstrated that by refactoring explicit stencil calculations, they were able to achieve speedups of 1 to 37 times faster than a traditional noise sensitive algorithm in an uncoordinated timing environment …”
Section: Importance Of Time Agreementmentioning
confidence: 99%