Steal Locally, Share Globally

Tousimojarad, Ashkan; Vanderbauwhede, Wim

doi:10.1007/s10766-015-0350-0

Cited by 2 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Authors in [32] used a thread pool with work-stealing and compared it to OpenMP, Cilk Plus, and TBB. They used Fibonacci as an example of unbalanced tasks.…”

Section: Related Workmentioning

confidence: 99%

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Mirsoleimani

Plaat

Herik

et al. 2015

2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)

View full text Add to dashboard Cite

Abstract-Many algorithms have been parallelized successfully on the Intel Xeon Phi coprocessor, especially those with regular, balanced, and predictable data access patterns and instruction flows. Irregular and unbalanced algorithms are harder to parallelize efficiently. They are, for instance, present in artificial intelligence search algorithms such as Monte Carlo Tree Search (MCTS). In this paper we study the scaling behavior of MCTS, on a highly optimized realworld application, on real hardware. The Intel Xeon Phi allows shared memory scaling studies up to 61 cores and 244 hardware threads. We compare work-stealing (Cilk Plus and TBB) and work-sharing (FIFO scheduling) approaches. Interestingly, we find that a straightforward thread pool with a work-sharing FIFO queue shows the best performance. A crucial element for this high performance is the controlling of the grain size, an approach that we call Grain Size Controlled Parallel MCTS. Our subsequent comparing with the Xeon CPUs shows an even more comprehensible distinction in performance between different threading libraries. We achieve, to the best of our knowledge, the fastest implementation of a parallel MCTS on the 61 core Intel Xeon Phi using a real application (47 relative to a sequential run).

show abstract

“…Authors in [32] used a thread pool with work-stealing and compared it to OpenMP, Cilk Plus, and TBB. They used Fibonacci as an example of unbalanced tasks.…”

Section: Related Workmentioning

confidence: 99%

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Mirsoleimani

Plaat

Herik

et al. 2015

2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)

View full text Add to dashboard Cite

show abstract

“…The Glasgow Parallel Reduction Machine (GPRM) [4] provides a task-based approach to manycore programming by structuring programs into task code, written as C++ classes, and communication code, written in GPC, a restricted subset of C++. The communication code describes how the tasks interact.…”

Section: Gprmmentioning

confidence: 99%

“…For this purpose, we have chosen three parallel programming models from different domains and for different reasons: OpenMP, the de-facto standard for programming shared memory architectures; OpenCL, known for being portable across multiple platforms; and finally GPRM, a pure taskbased programming framework. It has been reported that GPRM provides superior performance compared to OpenMP on manycore architectures [3] [4].…”

Section: Introductionmentioning

confidence: 99%

Comparison of Three Popular Parallel Programming Models on the Intel Xeon Phi

Tousimojarad

Vanderbauwhede

2014

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. Systems with large numbers of cores have become commonplace. Accordingly, applications are shifting towards increased parallelism. In a general-purpose system, applications residing in the system compete for shared resources. Thread and task scheduling in such a multithreaded multiprogramming environment is a significant challenge. In this study, we have chosen the Intel Xeon Phi system as a modern platform to explore how popular parallel programming models, namely OpenMP, Intel Cilk Plus and Intel TBB (Threading Building Blocks) scale on manycore architectures. We have used three benchmarks with different features which exercise different aspects of the system performance. Moreover, a multiprogramming scenario is used to compare the behaviours of these models when all three applications reside in the system. Our initial results show that it is to some extent possible to infer multiprogramming performance from single-program cases.

show abstract

Steal Locally, Share Globally

Cited by 2 publications

References 22 publications

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Comparison of Three Popular Parallel Programming Models on the Intel Xeon Phi

Contact Info

Product

Resources

About