SC18: International Conference for High Performance Computing, Networking, Storage and Analysis 2018
DOI: 10.1109/sc.2018.00038
|View full text |Cite
|
Sign up to set email alerts
|

Runtime-Assisted Cache Coherence Deactivation in Task Parallel Programs

Abstract: With increasing core counts, the scalability of directory-based cache coherence has become a challenging problem. To reduce the area and power needs of the directory, recent proposals reduce its size by classifying data as private or shared, and disable coherence for private data. However, existing classification methods suffer from inaccuracies and require complex hardware support with limited scalability. This paper proposes a hardware/software co-designed approach: the runtime system identifies data that is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 64 publications
0
8
0
Order By: Relevance
“…The dynamic task scheduler distributes ready tasks among all threads for asynchronous execution. This decoupling of the specification of the program from its dynamic execution eases programmability and enables many optimizations at the runtime system level in a generic and application-agnostic way [22], [24], [65], [75], [93].…”
Section: Task Dataflow Programming Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…The dynamic task scheduler distributes ready tasks among all threads for asynchronous execution. This decoupling of the specification of the program from its dynamic execution eases programmability and enables many optimizations at the runtime system level in a generic and application-agnostic way [22], [24], [65], [75], [93].…”
Section: Task Dataflow Programming Modelsmentioning
confidence: 99%
“…The runtime system can transparently manage GPUs [7], [76], FPGA accelerators [18], [85], multi-node clusters [20], [27], [28], heterogeneous memories [4], [63], scratchpad memories [5], NUMA [81], [82] and cache coherent NUMA [21], [23] systems. Adding hardware support, the runtime system can guide cache replacement [37], [65], cache coherence deactivation [22], cache prefetching [47], [75], cache communication mechanisms in producer-consumer task relationships [64], [66], reliability and resilience [51]- [53], value approximation [19], and DVFS to accelerate critical tasks [26].…”
Section: Task Dataflow Programming Modelsmentioning
confidence: 99%
“…Caheny et al [25] use runtime information to selectively deactivate cache coherence. Barredo et al [12] propose a compaction-restoration unit to join sparse predicated vector instructions into denser vectors.…”
Section: Runtime-aware Architecturesmentioning
confidence: 99%
“…Caheny et al [23,25] aim to reduce coherence traffic movement in NUMA systems by combining NUMA aware scheduling and data allocation. The same authors also propose to deactivate coherence for non-shared data as specified by the runtime system [24]. Sanchez et al [109] apply graph partitioning techniques to the TDG in order to reduce data transferences in NUMA systems.…”
Section: Exploiting Runtime System Information In the Architecturementioning
confidence: 99%