Exploiting the computing power of the diversity of resources available on heterogeneous systems is mandatory but a very challenging task. The diversity of architectures, execution models and programming tools, together with disjoint address spaces and different computing capabilities, raise a number of challenges that severely impact on application performance and programming productivity. This problem is further compounded in the presence of data parallel irregular applications. This paper presents a framework that addresses development and execution of data parallel irregular applications in heterogeneous systems. A unified task-based programming and execution model is proposed, together with inter and intra-device scheduling, which, coupled with a data management system, aim to achieve performance scalability across multiple devices, while maintaining high programming productivity. Intra-device scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels, which, by allowing dynamic generation and rescheduling of new work units, enable balancing irregular workloads and increase resource utilization. Results show that regular and irregular applications scale well with the number of devices, while requiring minimal programming effort. Consumer-producer kernels are able to sustain significant performance gains as long as the workload per basic work unit is enough to compensate overheads associated with intra-device scheduling. This not being the case, consumer kernels can still be used for the irregular application. Comparisons with an alternative framework, StarPU, which targets regular workloads, consistently demonstrate significant speedups. This is, to the best of our knowledge, the first published integrated approach that successfully handles irregular workloads over heterogeneous systems.
CFD simulations are a fundamental engineering application, implying huge workloads, often with dynamic behaviour due to runtime mesh refinement. Parallel processing over heterogeneous distributed memory clusters is often used to process such workloads. The execution of dynamic workloads over a set of heterogeneous resources leads to load imbalances that severely impacts execution time, when static uniform load distribution is used. This paper proposes applying dynamic, heterogeneity aware, load balancing techniques within CFD simulations. nSharma, a software package that fully integrates with OpenFOAM, is presented and assessed. Performance gains are demonstrated, achieved by reducing busy times standard deviation among resources, i.e. heterogeneous computing resources are kept busy with useful work due to an effective workload distribution. To best of authors' knowledge, nSharma is the first implementation and integration of heterogeneity aware load balancing in OpenFOAM and will be made publicly available in order to foster its adoption by the large community of OpenFOAM users.
Contents of this paper were reviewed by the Technical Committee of the 16 th International Congress of the Brazilian Geophysical Society and do not necessarily represent any position of the SBGf, its officers or members. Electronic reproduction or storage of any part of this paper for commercial purposes without the written consent of the Brazilian Geophysical Society is prohibited.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.