Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems 2014
DOI: 10.1145/2541940.2541987
|View full text |Cite
|
Sign up to set email alerts
|

Fence-free work stealing on bounded TSO processors

Abstract: Work stealing is the method of choice for load balancing in task parallel programming languages and frameworks. Yet despite considerable effort invested in optimizing work stealing task queues, existing algorithms issue a costly memory fence when removing a task, and these fences are believed to be necessary for correctness. This paper refutes this belief, demonstrating work stealing algorithms in which a worker does not issue a memory fence for microarchitectures with a bounded total store ordering (TSO) memo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(21 citation statements)
references
References 30 publications
0
21
0
Order By: Relevance
“…Targeting scheduling systems for task-based programs, a large amount of prior work aims to improve energy-efficiency [38,41], to improve data locality [9,10], or to reduce scheduling overhead [17,29]. However, with the increasing bandwidth requirements of computing tasks, many papers have also conducted related research for efficient bandwidth usage.…”
Section: Related Workmentioning
confidence: 99%
“…Targeting scheduling systems for task-based programs, a large amount of prior work aims to improve energy-efficiency [38,41], to improve data locality [9,10], or to reduce scheduling overhead [17,29]. However, with the increasing bandwidth requirements of computing tasks, many papers have also conducted related research for efficient bandwidth usage.…”
Section: Related Workmentioning
confidence: 99%
“…We further adapt the echo method [29] to make non-owner lock acquisition speed comparable to standard locks, assuming the owner acquires the lock frequently. Because our lock does not rely on blocking safe points, it (1) can be used in C/C++ programs, which do not naturally define safe points, and (2) enables non-owner acquisition even if the owner is scheduled out or delayed.…”
Section: Safe Memory Reclamation ( § 4)mentioning
confidence: 99%
“…This signals to T 1 that T 0 is waiting to acquire L, so T 1 can stop the ∆ delay and enter the critical section. To implement this notification we use echoing [29]: We expand the flags to 64-bits, 63 of which are used as version numbers that uniquely identify each writewhenever T 1 writes to flag 1 , it increases flag 1 's version. T 0 uses this version to notify T 1 that it is spinning while trying to acquire L, by writing-or echoing-what it reads from flag 1 into flag 0 (Lines 59-63).…”
Section: Ffbl Algorithmmentioning
confidence: 99%
See 2 more Smart Citations