Performance‐based parallel loop self‐scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters

Yang, Chao-Tung; Wu, Chao‐Chin; Chang, Jen-Hsiang

doi:10.1002/cpe.1627

Cited by 10 publications

(8 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The advantage of static scheduling is that it is easy to implement and has no extra scheduling overhead during runtime, but it may cause load unbalancing and thereby reduce computational efficiency (Zhu et al, 2004;Plank et al, 2007). Dynamic scheduling adjusts the schedule during execution and is especially suitable for situations where the number of steps, or the workload of each step, is undetermined (Yang et al, 2011). Although dynamic scheduling is more suitable for load balancing among parallel processors, the scheduling overhead should be executed at runtime and it needs extra processing time.…”

Section: Lps and Swmmentioning

confidence: 98%

“…This can be done by adjusting the chunk size. Therefore, these schemes cannot achieve load balancing in a CPU-GPU in an extremely heterogeneous environment (Yang et al, 2011).…”

Section: Lps and Swmmentioning

confidence: 99%

“…Yang and Chang proposed a combinational approach to solve parallel regular loop scheduling problem in an extremely heterogeneous computer cluster environment (Yang et al, 2011). In their experiments, they assigned 80% workload corresponding to the CPU clock using static scheduling and 20% workload using dynamic scheduling.…”

Section: Lps and Swmmentioning

confidence: 99%

See 2 more Smart Citations

Load-prediction scheduling algorithm for computer simulation of electrocardiogram in hybrid environments

Shen

Luo

Wei

et al. 2015

Journal of Systems and Software

View full text Add to dashboard Cite

Section: Lps and Swmmentioning

confidence: 98%

“…This can be done by adjusting the chunk size. Therefore, these schemes cannot achieve load balancing in a CPU-GPU in an extremely heterogeneous environment (Yang et al, 2011).…”

Section: Lps and Swmmentioning

confidence: 99%

See 1 more Smart Citation

Load-prediction scheduling algorithm for computer simulation of electrocardiogram in hybrid environments

Shen

Luo

Wei

et al. 2015

Journal of Systems and Software

View full text Add to dashboard Cite

“…In our previous approaches [22,23,25,26], the implementation of automatic parallelization based on OpenMP currently supports both C and C++ codes. Based on the result of these researches, we enhanced and extended the research.…”

Section: Automatic Parallelizationmentioning

confidence: 99%

An approach of performance comparisons with OpenMP and CUDA parallel programming on multicore systems

Chang

Yang

et al. 2016

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

SUMMARYIn the past, the tenacious semiconductor problems of operating temperature and power consumption limited the performance growth for single-core microprocessors. Microprocessor vendors hence adopt the multicore chip organizations with parallel processing because the new technology promises faster and lower power needed. In a short time, this trend floods first the development of CPU, then also the other peripherals like GPU. Modern GPUs are very efficient in manipulating computer graphics, and their highly parallel structure makes them even more effective than general-purpose CPUs for a range of graphical complex algorithms. However, technology of multicore processor brought revolution and unavoidable collision to the programming personnel. Multicore processor has high performance; however, parallel processing brings not only the opportunity but also a challenge. The issue of efficiency and the way how programmer or compiler parallelizes the software explicitly are the keys that enhance the performance on multicore chip. In this paper, we propose a parallel programming approach using hybrid CUDA, OpenMP, and MPI programming. There would be two verificational experiments presented in the paper. In the first, we would verify the availability and correctness of the auto-parallel tools, and discuss the performance issues on CPU, GPU, and embedded system. In the second, we would verify how the hybrid programming could surely improve performance.

show abstract

“…However, the size of each work queue is not determined based on the knowledge of loop and runtime environment in affinity scheduling. Some groups [9,10] have undertaken self-scheduling studies on particular architectures, considering the features of the system architecture. Our technique could be easily extended to these architectures.…”

Section: Background and Related Workmentioning

confidence: 99%

Knowledge-Based Adaptive Self-Scheduling

Wang

Shi

et al. 2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Loop scheduling scheme plays a critical role in the efficient execution of programs, especially loop dominated applications. This paper presents KASS, a knowledge-based adaptive loop scheduling scheme. KASS consists of two phases: static partitioning and dynamic scheduling. To balance the workload, the knowledge of loop features and the capabilities of processors are both taken into account using a heuristic approach in static partitioning phase. In dynamic scheduling phase, an adaptive self-scheduling algorithm is applied, in which two tuning parameters are set to control chunk sizes, aiming at load balancing and minimizing synchronization overhead. In addition, we extend KASS to apply on loop nests and adjust the chunk sizes at runtime. The experimental results show that KASS performs 4.8% to 16.9% better than the existing self-scheduling schemes, and up to 21% better than the affinity scheduling scheme.

show abstract

Performance‐based parallel loop self‐scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters

Cited by 10 publications

References 30 publications

Load-prediction scheduling algorithm for computer simulation of electrocardiogram in hybrid environments

Load-prediction scheduling algorithm for computer simulation of electrocardiogram in hybrid environments

An approach of performance comparisons with OpenMP and CUDA parallel programming on multicore systems

Knowledge-Based Adaptive Self-Scheduling

Contact Info

Product

Resources

About