Summary
This paper focuses on parallel implementations of three two‐dimensional explicit numerical methods on Intel® Xeon® Scalable Processor and the coprocessor Knights Landing. In this study, the performance of a hybrid parallel programming with message passing interface (MPI) and Open Multi‐Processing (OpenMP) and a pure MPI implementation used with two thread binding policies is compared with an improved OpenMP‐based implementation in three explicit finite‐difference methods for solving partial differential equations on shared‐memory multicore and manycore systems. Specifically, the improved OpenMP‐based version is a strategy that synchronizes adjacent threads and eliminates the implicit barriers of a naïve OpenMP‐based implementation. The experiments show that the most suitable approach depends on several characteristics related to the nonuniform memory access (NUMA) effect and load balancing, such as the size of the MPI domain and the number of synchronization points used in the parallel implementation. In algorithms that use four and five synchronization points, hybrid MPI/OpenMP approaches yielded better speedups than the other versions did in runs performed on both systems. The pure MPI‐based strategy, however, achieved better results than the other proposed approaches did in the method that employs only one synchronization point.