Proceedings of the 42nd Annual International Symposium on Computer Architecture 2015
DOI: 10.1145/2749469.2750396
|View full text |Cite
|
Sign up to set email alerts
|

MiSAR

Abstract: While numerous hardware synchronization mechanisms have been proposed, they either no longer function or suffer great performance loss when their hardware resources are exceeded, or they add significant complexity and cost to handle such resource overflows. Additionally, prior hardware synchronization proposals focus on one type (barrier or lock) of synchronization, so several mechanisms are likely to be needed to support real applications, many of which use locks, barriers, and/or condition variables. This pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 24 publications
0
7
0
Order By: Relevance
“…Most proposals focus on using dedicated hardware (accelerators, networks, new instructions) to speed-up synchronization primitives [1,6,25,41,47,55,61,62,69]. However, these proposals incur large area overheads.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Most proposals focus on using dedicated hardware (accelerators, networks, new instructions) to speed-up synchronization primitives [1,6,25,41,47,55,61,62,69]. However, these proposals incur large area overheads.…”
Section: Related Workmentioning
confidence: 99%
“…However, these approaches require support for Floating Operations in the cache hierarchy. Academia and industry have proposed a plethora of alternatives to near and far AMOs, introducing new synchronization instructions [3,6,47,55,61,62]. However, most ISAs are reticent to include instructions.…”
Section: Related Workmentioning
confidence: 99%
“…ElTantawy, et al [13] propose a hardware warp scheduling policy that reduces lock retries by de-prioritizing warps whose threads are spin waiting. In addition, hardware accelerated locks have also been proposed for CPUs [4,25,42,47].…”
Section: Gpu Solutionsmentioning
confidence: 99%
“…synchronization, become expensive by default and can degrade performance of benchmarks that use it. Liang et al have demonstrated that poorly implementing synchronization can reduce performance by approximately 40% in widely used benchmarks despite representing a small fraction of the code [22]. In message passing systems, algorithms that make frequent use of collective communication patterns will see their performance significantly reduced when scaled.…”
Section: Impact Of Multicast Performance On the Manycore Architecturementioning
confidence: 99%
“…Some machines have provided advanced hardware support, such as the barrier network in Cray T3D [252], the collectives network in Blue Gene/L [253], and the fetchand-Φ operations in the SGI Origin [254]. Also, there are multiple research proposals for advanced hardware support for synchronization (e.g., [22,60,62,[255][256][257]). Yet as technology scaling delivers larger and larger manycore chips, these patterns are expected to remain costly to support within the chip.…”
Section: Wisync: An Architecture For Fast On-chip Synchronizationmentioning
confidence: 99%