2014
DOI: 10.1145/2692916.2555251
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging hardware message passing for efficient thread synchronization

Abstract: As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms for thread synchronization in concurrent programs is becoming a major concern. On cache-coherent shared-memory processors, synchronization efficiency is ultimately limited by the performance of the underlying cache coherence protocol. This paper studies how hardware support for message passing can improve synchronization performance. Considering the ubiquitous problem of mutual exclusion, we adapt two stateof-th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
6
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 28 publications
0
6
0
Order By: Relevance
“…To remove the contention on locks and scale on massively parallel systems, message‐based counter technique allocates an extra thread running on a unique processor as an agent. The thread receives update requests from worker threads, and access counters on behalf of them.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…To remove the contention on locks and scale on massively parallel systems, message‐based counter technique allocates an extra thread running on a unique processor as an agent. The thread receives update requests from worker threads, and access counters on behalf of them.…”
Section: Related Workmentioning
confidence: 99%
“…Counting algorithm can be formulated as a reader‐writer problem, which can be solved by properly protecting the shared resource, counter , to make the read and write operations atomic to avoid intermediate states. Over the past decades, a variety of counting algorithms have been invented to meet requirements in algorithm efficiency, memory usage, and consistency of counting results. In the past decade, driven by Moore's Law, computer hardware and its parallelism evolved sharply.…”
Section: Introductionmentioning
confidence: 99%
“…However, for emerging many-core processors, conventional coherent cache architecture has become more and more complex and it is very hard to achieve high performance [32]. A novel architectural feature, Explicit inter-core Message Passing (EMP), has gained popularity in research and even been used in some product many-core processors, such as TILE-Gx8036 [30] and SW26010 [13]. The Sunway TaihuLight [1] supercomputer is powered by SW26010 that uses EMP instead of coherent cache to share data among cores.…”
Section: Introductionmentioning
confidence: 99%
“…Lock-free Atomic instructions Very Low High Delegation (shm version) [25] Lock-free Atomic instructions Low High Transactional Memory [29] Lock-free Transactional memory instructions Medium Conflict rate dependent Spinlock [4] Lock-based Atomic instructions High Low POSIX mutex lock Lock-based OS dependent High Medium Queue-lock [27] Lock-based Atomic instructions High Medium Delegation (EMP version) [30] Lock EMP has been used to accelerate the request sending routine in RCL [30], and the performance is improved by 4.3×. On Sunway Taihulight [13], concurrent data classification obtains a speedup of 19.15× after changing synchronization method from shared-memory based locking to EMP based delegation [24].…”
mentioning
confidence: 99%
See 1 more Smart Citation