Reliable mesh-based simulations are needed to solve complex engineering problems. Mesh adaptivity can increase reliability by reducing discretization errors but requires multiple software components to exchange information. Often, components exchange information by reading and writing a common file format. This file-based approach becomes a problem on massively parallel computers where filesystem bandwidth is a critical performance bottleneck. Our approach using data streams and component interfaces avoids the filesystem bottleneck. In this paper, we present these techniques and their use for coupling mesh adaptivity to the PHASTA computational fluid dynamics solver, the Albany multi-physics framework, and the Omega3P linear accelerator frequency analysis applications. Performance results are reported on up to 16,384 cores of an Intel Knights Landing-based system.
KEYWORDSin-memory, mesh adaptation, parallel, unstructured mesh, workflow
INTRODUCTIONSimulations on massively parallel systems are most effective when data movement is minimized. Data movement costs increase with the depth of the memory hierarchy, ie, a design trade-off for increased capacity. For example, the lowest level on-node storage in the IBM Blue Gene/Q A2 processor 1 is the per core 16 KiB L1 cache (excluding registers) and has a peak bandwidth of 819 GiB/s. The highest level on-node storage, 16 GiB of DDR3 main memory, provides a million times more capacity but at a greatly reduced bandwidth of 43 GiB/s, 1/19th of L1 cache. 2 One level further up the hierarchy is the parallel filesystem.* At this level, the bandwidth and capacity relationship are again less favorable and further compromised by the fact that the filesystem is a shared resource. Table 1 lists the per node peak main memory and filesystem bandwidth across five generations of Argonne National Laboratory leadership class systems, ie, Blue Gene/L, 5,6 Intrepid Blue Gene/P, 7,8 Mira Blue Gene/Q, 1,9 Theta, 10,11 and 2018's Aurora. 12Based on these peak values, the bandwidth gap between main memory and the filesystem is at least three orders of magnitude. Software must leverage the cache and main memory bandwidth performance advantage during as many workflow operations as possible to maximize performance.This paper presents a set of in-memory component coupling techniques that avoid filesystem use. We demonstrate these techniques for three different unstructured mesh-based adaptive analysis workflows. These demonstrations highlight the need for in-memory coupling techniques that are compatible with the design and execution of the analysis software involved. Key to this compatibility is supporting two interaction modes, ie, bulk and atomic information transfers.Section 3 provides a definition of the information transfer modes and reviews methods to couple workflow components using them. The core interfaces supporting adaptive unstructured mesh workflows are described in Section 3.1, and examples are given for their use in bulk and atomic information transfers. Section 3.2 details the d...