“…For this reason, there is sufficient interest in studying the impact of programming environments on the efficiency of porting real‐life applications to heterogeneous architectures. For systems with MIC accelerators, we used 19 various offload‐based programming environments, namely, (1) Intel Offload 24 coupled with OpenMP, (2) OpenMP Accelerator Model, 11 and (3) hStreams framework 1 with OpenMP. But none of them can be applied for GPUs.…”
Section: Related Workmentioning
confidence: 99%
“…The contributions of this work to areas of parallel computing are as follows: For a real‐world scientific application for the numerical modeling of alloy solidification, we provide a comprehensive study of porting applications to heterogeneous computing platforms with GPU accelerators, aiming at achieving a flexible workload distribution between available CPU–GPU resources and optimizing the application performance.Considering the solidification application as a use case, we explore the basic steps required for (i) adaptation of an application to heterogeneous CPU–GPU platforms, based on a reformulation of steps developed previously 19 for CPU–MIC architectures, and then (ii) mapping the application workload onto the OpenCL programming model. As a result, the mapping process allows us to utilize OpenCL for harnessing CPU and GPU cores using data parallelism, as well as for the management of available compute devices with task parallelism.Experimental evaluation of the performance of the resulting OpenCL code on two platforms with powerful GPUs of various generations (with Kepler and Volta architectures) confirms the performance advantage of using computing resources of both GPUs and CPUs.…”
Section: Introductionmentioning
confidence: 99%
“…In our previous works, 9,[16][17][18][19] we dealt with porting and optimization of the application for modeling the alloy solidification on heterogeneous computing platforms with Intel MIC (also known as Xeon Phi) accelerators, 8 without significant modifications of the code. The last two articles present an approach that takes advantage of using both CPUs and MICs for the parallel execution of the application workload when all available cores of both devices are utilized coherently to solve the modeling problem.…”
mentioning
confidence: 99%
“…2. Considering the solidification application as a use case, we explore the basic steps required for (i) adaptation of an application to heterogeneous CPU-GPU platforms, based on a reformulation of steps developed previously 19 for CPU-MIC architectures, and then (ii) mapping the application workload onto the OpenCL programming model. As a result, the mapping process allows us to utilize OpenCL for harnessing CPU and GPU cores using data parallelism, as well as for the management of available compute devices with task parallelism.…”
Summary
This article provides a comprehensive study of OpenCL heterogeneous programming for porting applications to CPU–GPU computing platforms, with a real‐life application for the solidification modeling. The aim is to achieve a flexible workload distribution between available CPU–GPU resources and optimize application performance. Considering the solidification application as a use case, we explore the necessary steps required for (i) adaptation of an application to CPU–GPU platforms, and (ii) mapping the application workload onto the OpenCL programming model. The adaptation is based on a reformulation of steps developed previously for CPU–MIC architectures. The mapping process allows us to utilize OpenCL for harnessing CPU and GPU cores using data parallelism, as well as for the management of available compute devices with task parallelism. The resulting OpenCL code's performance and energy efficiency is experimentally studied for two platforms with powerful GPUs of various generations (with Kepler and Volta architectures). The experiments confirm the performance advantage of using computing resources of both GPUs and CPUs. The achieved benefit depends on the relationship between the computing power of CPUs and GPUs. Moreover, this gain entails the growth of the average power that increases the energy consumed during the application execution.
“…For this reason, there is sufficient interest in studying the impact of programming environments on the efficiency of porting real‐life applications to heterogeneous architectures. For systems with MIC accelerators, we used 19 various offload‐based programming environments, namely, (1) Intel Offload 24 coupled with OpenMP, (2) OpenMP Accelerator Model, 11 and (3) hStreams framework 1 with OpenMP. But none of them can be applied for GPUs.…”
Section: Related Workmentioning
confidence: 99%
“…The contributions of this work to areas of parallel computing are as follows: For a real‐world scientific application for the numerical modeling of alloy solidification, we provide a comprehensive study of porting applications to heterogeneous computing platforms with GPU accelerators, aiming at achieving a flexible workload distribution between available CPU–GPU resources and optimizing the application performance.Considering the solidification application as a use case, we explore the basic steps required for (i) adaptation of an application to heterogeneous CPU–GPU platforms, based on a reformulation of steps developed previously 19 for CPU–MIC architectures, and then (ii) mapping the application workload onto the OpenCL programming model. As a result, the mapping process allows us to utilize OpenCL for harnessing CPU and GPU cores using data parallelism, as well as for the management of available compute devices with task parallelism.Experimental evaluation of the performance of the resulting OpenCL code on two platforms with powerful GPUs of various generations (with Kepler and Volta architectures) confirms the performance advantage of using computing resources of both GPUs and CPUs.…”
Section: Introductionmentioning
confidence: 99%
“…In our previous works, 9,[16][17][18][19] we dealt with porting and optimization of the application for modeling the alloy solidification on heterogeneous computing platforms with Intel MIC (also known as Xeon Phi) accelerators, 8 without significant modifications of the code. The last two articles present an approach that takes advantage of using both CPUs and MICs for the parallel execution of the application workload when all available cores of both devices are utilized coherently to solve the modeling problem.…”
mentioning
confidence: 99%
“…2. Considering the solidification application as a use case, we explore the basic steps required for (i) adaptation of an application to heterogeneous CPU-GPU platforms, based on a reformulation of steps developed previously 19 for CPU-MIC architectures, and then (ii) mapping the application workload onto the OpenCL programming model. As a result, the mapping process allows us to utilize OpenCL for harnessing CPU and GPU cores using data parallelism, as well as for the management of available compute devices with task parallelism.…”
Summary
This article provides a comprehensive study of OpenCL heterogeneous programming for porting applications to CPU–GPU computing platforms, with a real‐life application for the solidification modeling. The aim is to achieve a flexible workload distribution between available CPU–GPU resources and optimize application performance. Considering the solidification application as a use case, we explore the necessary steps required for (i) adaptation of an application to CPU–GPU platforms, and (ii) mapping the application workload onto the OpenCL programming model. The adaptation is based on a reformulation of steps developed previously for CPU–MIC architectures. The mapping process allows us to utilize OpenCL for harnessing CPU and GPU cores using data parallelism, as well as for the management of available compute devices with task parallelism. The resulting OpenCL code's performance and energy efficiency is experimentally studied for two platforms with powerful GPUs of various generations (with Kepler and Volta architectures). The experiments confirm the performance advantage of using computing resources of both GPUs and CPUs. The achieved benefit depends on the relationship between the computing power of CPUs and GPUs. Moreover, this gain entails the growth of the average power that increases the energy consumed during the application execution.
Summary
Intel's Xeon Phi combines the parallel processing power of a many‐core accelerator with the programming ease of CPUs. In this paper, we present a survey of works that study the architecture of Phi and use it as an accelerator for a broad range of applications. We review performance optimization strategies as well as the factors that bottleneck the performance of Phi. We also review works that perform comparison or collaborative execution of Phi with CPUs and GPUs. This paper will be useful for researchers and developers in the area of computer‐architecture and high‐performance computing.
Summary
This work is a part of the global tendency to use modern computing systems for modeling the phase‐field phenomena. The main goal of this article is to improve the performance of a parallel application for the solidification modeling, assuming the dynamic intensity of computations in successive time steps when calculations are performed using a carefully selected group of nodes in the grid. A two‐step method is proposed to optimize the application for multi‐/manycore architectures. In the first step, the loop fusion is used to execute all kernels in a single nested loop and reduce the number of conditional operators. These modifications are vital to implementing the second step, which includes an algorithm for the dynamic workload prediction and load balancing across cores of a computing platform. Two versions of the algorithm are proposed—with the 1D and 2D maps used for predicting the computational domain within the grid. The proposed optimizations allow increasing the application performance significantly for all tested configurations of computing resources. The highest performance gain is achieved for two Intel Xeon Platinum 8180 CPUs, where the new code based on the 2D map yields the speedup of up to 2.74 times, while the usage of the proposed method with the 2D map for a single KNL accelerator permits reducing the execution time up to 1.91 times.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.