In this work, a new partition-collocation strategy for the parallel execution of CFD-DEM couplings is investigated. Having a good parallel performance is a key issue for an Eulerian-Lagrangian software that aims to be applied to solve industrially significant problems, as the computational cost of these couplings is one of their main drawback. The approach presented here consists in co-locating the overlapping parts of the simulation domain of each software on the same MPI process, in order to reduce the cost of the data exchanges. It is shown how this strategy allows reducing memory consumption and inter-process communication between CFD and DEM to a minimum and therefore to overcome an important parallelization bottleneck identified in the literature. Three benchmarks are proposed to assess the consistency and scalability of this approach. A coupled execution on 280 cores shows that less than 0.1% of the time is used to perform inter-physics data exchange.
IntroductionEulerian-Lagrangian couplings are nowadays widely used to address engineering and technical problems. In particular, CFD-DEM couplings have been successfully applied to study several configurations ranging from mechanical [14,13], to chemical [11] and environmental [8] engineering.CFD-DEM coupled simulations are normally very computationally intensive, and, as already pointed out in [10], the execution time represents a major issue for the applicability of this numerical approach to complex scenarios. Therefore, optimizing the parallel performance of such a coupling is a fundamental step for allowing large-scale numerical solutions of industrial and technical problems.The parallelization of Eulerian-Lagrangian software is, however, rather delicate. This is mainly due to the fact that the optimal partitioning strategies for those wireframes are different, and the memory requirement of a coupled solution can represent a major performance issue.Furthermore, since the coupling normally affects an extended domain region (often the whole computational domain), the amount of information that is required to be exchanged is normally important. For this reason, highly efficient coupling approaches for boundary problems, like the one proposed in [4] may suffer for the extensive communication layer. At the same time, due to the Eulerian-Lagrangian nature of the coupling, mesh-based communication as the one proposed in [9] cannot, by themselves take care of the information exchange.One of the earliest attempt to parallelize a DEM algorithm was proposed in [21], where the authors distributed inter-particle contacts among processors on a machine featuring 512 cores. In their scheme, all particle data was stored in every process resulting in a memory-intensive computation that leads to a speedup of 8.73 for 512 cores with a 1672 particles assembly. A later work [12] showed how by reducing the DEM inter-process communication, a speedup of ∼11 within 16 process-computation of 100k particles was obtainable. This proved how, for the sole Lagrangian software, the memo...