International audienceGeneral purpose (GP)GPU programming demands to couple highly parallel computing units with classic CPUs to obtain a high performance. Heterogenous systems lead to complex designs combining multiple paradigms and programming languages to manage each hardware architecture. In this paper, we present tools to harness GPGPU programming through the high-level OCaml programming language. We describe the SPOC library that allows to handle GPGPU subprograms (kernels) and data transfers between devices. We then present how SPOC expresses GPGPU kernel: through interoperability with common low-level extensions (from Cuda and OpenCL frameworks) but also via an embedded DSL for OCaml. Using simple benchmarks as well as a real world HPC software, we show that SPOC can offer a high performance while efficiently easing development. To allow better abstractions over tasks and data, we introduce some parallel skeletons built upon SPOC as well as composition constructs over those skeletons
Abstract-Differences in simulation results may be observed from one architecture to another or even inside the same architecture. Such reproducibility failures are often due to different rounding errors generated by different orders in the sequence of arithmetic operations. Reproducibility problems are particularly noticeable on new computing architectures such as multicore processors or GPUs (Graphics Processing Units). DSA (Discrete Stochastic Arithmetic) enables one to estimate rounding error propagation in simulation programs. In this paper, it is shown that DSA can be used to estimate which digits in simulation results may be different from one environment to another because of rounding errors. A particular implementation of DSA, which enables numerical validation in hybrid CPU-GPU environments, is described. The estimation of numerical reproducibility using DSA is illustrated by a wave propagation code which can be affected by reproducibility problems when executed on different architectures.
Oil and gas companies rely on high performance computing to process seismic imaging algorithms such as reverse time migration. Graphics processing units are used to accelerate reverse time migration, but these deployments suffer from limitations such as the lack of high graphics processing unit memory capacity, frequent CPU-GPU communications that may be bottlenecked by the PCI bus transfer rate, and high power consumptions. Recently, AMD has launched the Accelerated Processing Unit (APU): a processor that merges a CPU and a graphics processing unit on the same die featuring a unified CPU-GPU memory. In this paper, we explore how efficiently may the APU be applicable to reverse time migration. Using OpenCL (along with MPI and OpenMP), a CPU/APU/GPU comparative study is conducted on a single node for the 3D acoustic reverse time migration, and then extended on up to 16 nodes. We show the relevance of overlapping the I/O and MPI communications with the computations for the APU and graphics processing unit clusters, that performance results of APUs range between those of CPUs and those of graphics processing units, and that the APU power efficiency is greater than or equal to the graphics processing unit one.
International audienceThe gradient method with retards (GMR) is a nonmonotone iterative method recently developed to solve large, sparse, symmetric, and positive definite linear systems of equations. Its performance depends on the retard parameter $\overline{m}$. The larger the $\overline{m}$, the faster the convergence, but also the faster the loss of precision is observed in the intermediate computations of the algorithm. This loss of precision is mainly produced by the nonmonotone behavior of the norm of the gradient which also increases with $\overline{m}$. In this work, we first use a recently developed inexpensive technique to smooth down the nonmonotone behavior of the method. Then we show that it is possible to choose $\overline{m}$ adaptively during the process to avoid loss of precision. Our adaptive choice of $\overline{m}$ can be viewed as a compromise between numerical stability and speed of convergence. Numerical results on some classical test problems are presented to illustrate the good numerical properties
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.