Con sis te n t * Comple te * W ell Docu m e n te d * Easy to R e us e * * E v a lu ate d * P oP * A r t ifact * A EC P PExposing massive parallelism on 3D unstructured meshes computation with efficient load balancing and minimal synchronizations is challenging. Current approaches relying on domain decomposition and mesh coloring struggle to scale with the increasing number of cores per nodes, especially with new many-core processors. In this paper, we propose an hybrid approach using domain decomposition to exploit distributed memory parallelism, Divide-and-Conquer, D&C, to exploit shared memory parallelism and improve locality, and mesh coloring at core level to exploit vectors. It illustrates a new trade-off for many-cores between structuredness, memory locality, and vectorization. We evaluate our approach on the finite element matrix assembly of an industrial fluid dynamic code developed by Dassault Aviation. We compare our D&C approach to domain decomposition and to mesh coloring. D&C achieves a high parallel efficiency, a good data locality as well as an improved bandwidth usage. It competes on current nodes with the optimized pure MPI version with a minimum 10% speed-up. D&C shows an impressive 319x strong scaling on 512 cores (32 nodes) with only 2000 vertices per core. Finally, the Intel Xeon Phi version has a performance similar to 10 Intel E5-2665 Xeon Sandy Bridge cores and 95% parallel efficiency on the 60 physical cores. Running on 4 Xeon Phi (240 cores), D&C has 92% efficiency on the physical cores and performance similar to 33 Intel E5-2665 Xeon Sandy Bridge cores.
To take full advantage of future HPC systems, hybrid parallelization strategies are required. In a previous work, we demonstrated the Divide and Conquer, D&C, approach for efficient parallelization of finite element methods on unstructured meshes. In this paper we experiment the concept of proto-application as a proxy between computer scientists and application developers on a real industrial use-case. The D&C library has been entirely developed on the proto-application and then validated on the original application. We also ported the D&C library to another fluid dynamic application, AETHER, developed by Dassault Aviation. The results show that the speed-up validated on the proto-application can be reproduced on other full scale applications using similar computational patterns.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.