Ana Moreton-Fernandez scite author profile

Ortega–Arranz

The International Journal of High Performance Computing Applica

2017

Nowadays the use of hardware accelerators, such as the Graphics Processing Units (GPUs) or XeonPhi coprocessors, is key to solve computationally costly problems that require High Performance Computing (HPC). However, programming solutions for an efficient deployment in this kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data needed to be computed at each moment, in different computing platforms, also considering architectural details. We introduce the Controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching CPU kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the Controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions using directly CUDA and OpenMP.

Exploiting distributed and shared memory hierarchies with Hitmap

Llanos

2014

Current multicomputers are typically built as interconnected clusters of shared-memory multicore computers. A common programming approach for these clusters is to simply use a message-passing paradigm, launching as many processes as cores available. Nevertheless, to better exploit the scalability of these clusters and highly-parallel multicore systems, it is needed to efficiently use their distributed-and shared-memory hierarchies. This implies to combine different programming paradigms and tools at different levels of the program design.This paper presents an approach to ease the programming for mixed distributed and shared memory parallel computers. The coordination at the distributed memory level is simplified using Hitmap, a library for distributed computing using hierarchical tiling of data structures. We show how this tool can be integrated with shared-memory programming models and automatic code generation tools to efficiently exploit the multicore environment of each multicomputer node. This approach allows to exploit the most appropriate techniques for each model, easily generating multilevel parallel programs that automatically adapt their communication and synchronization structures to the target machine. Our experimental results show how this approach mimics or even improves the best performance results obtained with manually optimized codes using pure MPI or OpenMP models.

Supporting the Xeon Phi Coprocessor in a Heterogeneous Programming Model

Rodriguez-Gutiez

et al. 2017

Supercomputers are becoming more heterogeneous. They are composed by several machines with different computation capabilities and different kinds and families of accelerators, such as GPUs or Intel Xeon Phi coprocessors. Programming these machines is a hard task, that requires a deep study of the architectural details, for exploiting efficiently each computational unit. In this paper, we present an extension of a heterogeneous programming model, in order to also support Intel Xeon Phi coprocessors. This contribution extends an existing heterogeneous programming library, by taking advantage of both the GPU communication model and the CPU execution model of the original library. Our experimental results show that using our approach, the programming effort needed for changing the target devices is highly reduced, by for example reusing the 97% of the code between a GPU implementation and a Xeon Phi implementation for the Mandelbrot benchmark.

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Llanos

2017

Int J Parallel Prog

Current HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming, mixing accelerators and CPU cores. However, the portability compromises in many cases the efficiency on different devices, and there are details about the coordination of different types of devices that should be still tackled by the programmer.In this work we introduce the Multi-Controler (MCtrl), an abstract entity implemented in a library, that coordinates the management of heterogeneous devices, including accelerators with different capabilities and sets of CPUcores. Our proposal improves state-of-the-art solutions, simplifying the data partition, mapping, and transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA's GPUs, or OpenMP for CPU-cores). The runtime system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing the data movements, and hiding the launching details. Results of an experimental study with five study cases indicates that our abstraction allows the development of flexible and high efficient programs, that adapt to the heterogeneous environment.

On the run-time cost of distributed-memory communications generated using the polyhedral model

Llanos

2015