Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm

Szustak, Łukasz; Rojek, Krzysztof; Gepner, Paweł

doi:10.1007/978-3-642-55224-3_54

Cited by 23 publications

(20 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, after empirical performance and programmability studies performed by many researchers [15,42,47,48] it is clear that to achieve high performance, Intel Xeon Phi still needs help from programmers, and that merely relying on compilers with traditional programming models is still far from reality. In fact, high degree of parallelism of Xeon Phi accelerators is best suited to applications that are structured to use the parallelism.…”

Section: Intel Mic Architecturementioning

confidence: 99%

Adaptation of RBM Learning for Intel MIC Architecture

Olas

Mleczko

Nowicki

et al. 2015

Artificial Intelligence and Soft Computing

View full text Add to dashboard Cite

Abstract. In the paper, the parallel realization of the Boltzmann Restricted Machine (RBM) is proposed. The implementation intends to use multicore architectures of modern CPUs and Intel Xeon Phi coprocessor. The learning procedure is based on the matrix description of RBM, where the learning samples are grouped into packages, and represented as matrices. The influence of the package size on convergence of learning, as well as on performance of computation, are studied for various number of threads, using conventional CPU and Intel Phi architecures. Our research confirms a potential usefulness of MIC parallel architecture for implementation of RBM and similar algorithms.

show abstract

Section: Intel Mic Architecturementioning

confidence: 99%

Adaptation of RBM Learning for Intel MIC Architecture

Olas

Mleczko

Nowicki

et al. 2015

Artificial Intelligence and Soft Computing

View full text Add to dashboard Cite

show abstract

“…This method combines the island-of-core strategy with the (3+1)D hier-archical decomposition proposed previously in [19,21]. The efficiency of the method is evaluated for the implementation of MPDATA on the SGI UV 2000 and UV 3000 servers, as well as 2-and 4-socket ccNUMA platforms based on various Intel CPU microarchitectures, including Skylake, Broadwell, and Haswell.…”

Section: Introductionmentioning

confidence: 99%

“…To alleviate the memory-bound nature of MPDATA, we proposed [19][20][21] a new strategy of workload distribution for heterogeneous stencils computations. The main aim was to better exploit the cache hierarchy by moving the bulk of data traffic from the main memory to the cache hierarchy.…”

Section: Introduction To Parallelization Of Mpdata Application On Shamentioning

confidence: 99%

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

et al. 2018

Self Cite

View full text Add to dashboard Cite

This paper meets the challenge of harnessing the heterogeneous communication architecture of ccNUMA multiprocessors for heterogeneous stencil computations, an important example of which is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA). We propose a method for optimization of parallel implementation of heterogeneous stencil computations that is a combination of the islands-of-core strategy and (3+1)D decomposition. The method allows a flexible management of the trade-off between computation and communication costs in accordance with features of modern ccNUMA architectures. Its efficiency is demonstrated for the implementation of MPDATA on the SGI UV 2000 and UV 3000 servers, as well as for 2-and 4-socket ccNUMA platforms based on various Intel CPU architectures, including Skylake, Broadwell, and Haswell.

show abstract

“…This method is based on the partitioning of available cores/threads into independent work teams. This paper is an extended version of work presented in [1,12]. This study not only proposes modifications in the (3 + 1)D decomposition of MPDATA, but also introduces the notion of work team partitioning.…”

Section: Introductionmentioning

confidence: 99%

“…As a result, modern processor architectures are very unbalanced concerning the relation of theoretical peak performance versus memory bandwidth [1]. One of the main problems of porting codes to the latest computing platforms is to take the full advantage of memory hierarchies.…”

Section: Introductionmentioning

confidence: 99%

Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor

Szustak

Rojek

Olas

et al. 2015

Scientific Programming

Self Cite

View full text Add to dashboard Cite

The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forwardin-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1)D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.

show abstract

Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm

Cited by 23 publications

References 6 publications

Adaptation of RBM Learning for Intel MIC Architecture

Adaptation of RBM Learning for Intel MIC Architecture

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor

Contact Info

Product

Resources

About