An efficient edge based data structure has been developed in order to implement an unstructured vertex based finite volume algorithm for the Reynolds-averaged Navier-Stokes equations on hybrid meshes. In the present approach, the data structure is tailored to meet the requirements of the vertex based algorithm by considering data access patterns and cache efficiency. The required data are packed and allocated in a way that they are close to each other in the physical memory. Therefore, the proposed data structure increases cache performance and improves computation time. As a result, the explicit flow solver indicates a significant speed up compared to other open-source solvers in terms of CPU time. A fully implicit version has also been implemented based on the PETSc library in order to improve the robustness of the algorithm. The resulting algebraic equations due to the compressible Navier-Stokes and the one equation Spalart-Allmaras turbulence equations are solved in a monolithic manner using the restricted additive Schwarz preconditioner combined with the FGMRES Krylov subspace algorithm. In order to further improve the computational accuracy, the multiscale metric based anisotropic mesh refinement library PyAMG is used for mesh adaptation. The numerical algorithm is validated for the classical benchmark problems such as the transonic turbulent flow around a supercritical RAE2822 airfoil and DLR-F6 wing-body-nacelle-pylon configuration. The efficiency of the data structure is demonstrated by achieving up to an order of magnitude speed up in CPU times.
On modern hardware architectures, the performance of Flux Reconstruction (FR) methods can be limited by memory bandwidth. In a typical implementation, these methods are implemented as a chain of distinct kernels. Often, a dataset which has just been written in the main memory by a kernel is read back immediately by the next kernel. One way to avoid such a redundant expenditure of memory bandwidth is kernel fusion. However, on a practical level kernel fusion requires that the source for all kernels be available, thus preventing calls to certain third-party library functions. Moreover, it can add substantial complexity to a codebase. An alternative to full kernel fusion is cache blocking.But for this to be effective, CPU cache has to be meaningfully big. Historically, size of L1 and L2 caches prevented cache blocking for high-order CFD applications. However in recent years, size of L2 cache has grown from around 0.25 MiB to 1.25 MiB, and made it possible to apply cache blocking for highorder CFD codes. In this approach, kernels remain distinct, and are executed one after another on small chunks of data that can fit in the cache, as opposed to on full datasets. These chunks of data stay in the cache and whenever a kernel requests access to data that is already in the cache, memory bandwidth is conserved. In this study, a data structure that facilitates cache blocking is considered, and a range of kernel grouping configurations for an FR based Euler solver are examined. A theoretical study is conducted for hexahedral elements with no anti-aliasing at p = 3 and p = 4 in order to determine the predicted performance of a few kernel grouping configurations. Then, these candidates
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.