Many classes of high-performance applications and combinatorial problems exhibit large degree of runtime load variability. One approach to achieving balanced resource use is to over decompose the problem on fine-grained tasks that are then dynamically balanced using approaches such as workstealing. Existing work stealing techniques for such irregular applications, running on large clusters, exhibit high overheads due to potential untimely interruption of busy nodes, excessive communication messages and delays experienced by idle nodes in finding work due to repeated failed steals. We contend that the fundamental problem of distributed work-stealing is of rapidly bringing together work producers and consumers. In response, we develop an algorithm that performs timely, lightweight and highly efficient matchmaking between work producers and consumers which results in accurate load balance. Experimental evaluations show that our scheduler is able to outperform other distributed work stealing schedulers, and to achieve scale beyond what is possible with current approaches.
Many classes of high-performance applications and combinatorial problems exhibit large degree of runtime load variability. One approach to achieving balanced resource use is to over decompose the problem on fine-grained tasks that are then dynamically balanced using approaches such as workstealing. Existing work stealing techniques for such irregular applications, running on large clusters, exhibit high overheads due to potential untimely interruption of busy nodes, excessive communication messages and delays experienced by idle nodes in finding work due to repeated failed steals. We contend that the fundamental problem of distributed work-stealing is of rapidly bringing together work producers and consumers. In response, we develop an algorithm that performs timely, lightweight and highly efficient matchmaking between work producers and consumers which results in accurate load balance. Experimental evaluations show that our scheduler is able to outperform other distributed work stealing schedulers, and to achieve scale beyond what is possible with current approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.