Resolving numerically Vlasov-Poisson equations for initially cold systems can be reduced to following the evolution of a three-dimensional sheet evolving in six-dimensional phase-space. We describe a public parallel numerical algorithm consisting in representing the phase-space sheet with a conforming, self-adaptive simplicial tessellation of which the vertices follow the Lagrangian equations of motion. The algorithm is implemented both in six-and fourdimensional phase-space. Refinement of the tessellation mesh is performed using the bisection method and a local representation of the phase-space sheet at second order relying on additional tracers created when needed at runtime. In order to preserve in the best way the Hamiltonian nature of the system, refinement is anisotropic and constrained by measurements of local Poincaré invariants. Resolution of Poisson equation is performed using the fast Fourier method on a regular rectangular grid, similarly to particle in cells codes. To compute the density projected onto this grid, the intersection of the tessellation and the grid is calculated using the method of Franklin and Kankanhalli [64,65,66] generalised to linear order. As preliminary tests of the code, we study in four dimensional phase-space the evolution of an initially small patch in a chaotic potential and the cosmological collapse of a fluctuation composed of two sinusoidal waves. We also perform a "warm" dark matter simulation in six-dimensional phase-space that we use to check the parallel scaling of the code. arXiv:1509.07720v1 [physics.comp-ph] 23 Sep 2015
ImplementationSupercomputers featuring large clusters of shared memory nodes are becoming the norm, with a continuing trend of increasing number of cores per node. Taking advantage of such processing power is challenging, especially for problems such as gravitational dynamics that are by essence non-local and for which significant inter-process communication cannot be avoided. Scaling up to (tens of) thousands of cores using pure MPI communication, the standard message passing interface for distributed memory computers, often results in numerous messages being sent all over the network, triggering traffic contentions that almost invariably end up being the limiting performance factors. One way to alleviate this problem consists in using a hybrid approach, combining coarse grained MPI parallelism with local shared-memory multiprocessing via for instance OpenMP. Indeed, MPI parallelism is oftentimes achieved through domain decomposition, each MPI distributed sub-domain communicating preferentially with its direct neighbors, but also potentially with all other sub-domains via all-to-all type communications. The usage of local OpenMP style parallelism allows for larger and less numerous sub-domains. In this case, neighbor-to-neighbor communications, that are often achieved via buffer regions called "ghost layers" locally keeping track of neighboring domains boundaries updates, are therefore reduced, since these regions typically scale like the surface o...