We discuss various strategies for parallelizing streamline simulators and present a single-phase shared memory implementation. The choice of a shared memory programming model is motivated by its suitability for streamline simulation, as well as the rapid advance of multicore processors, which are readily available at low-cost. We show that streamline-based methods are easily parallelizable on shared memory architectures through their decomposition of the multidimensional transport equations into a large set of independent 1D transport solves. We tested both a specialized explicit load balancing algorithm that optimizes the streamline load distribution across threads to minimize the time that any of the threads are idle, and the dynamic load balancing algorithms provided by OpenMP on the shared memory machines. Our results clearly indicate that built-in schedulers are competitive with specialized load balancing strategies as long as the number of streamlines per thread is sufficiently high, which is the case in field applications. The average workload per thread is nominally insensitive to workload variations between individual streamlines, and any load balancing advantage offered by explicit strategies is not sufficient to overcome associated computational and parallel overhead. In terms of the allocation of streamlines or streamline segments to threads, we investigated both the distributed approach, in which threads are assigned streamline segments, and the owner approach, in which threads own complete streamlines. We found that the owner approach is most suitable. The slight advantage that the distributed approach has in terms of load balancing is not enough to compensate for the additional overheads. Moreover, the owner approach allows straightforward re-use of existing sequential codes, which is not the case for the distributed approach in case of implicit or adaptive implicit solution strategies. The tracing and mapping stages in streamline simulation have low parallel efficiency. However, in real-field models, the computational burden of the streamline solves is significantly heavier than that of the tracing and mapping stages, and therefore, the impact of these stages is limited. We tested the parallelization on three shared memory systems: a 24 dualcore processor Sun SPARC server; an eight-way Sun Opteron server, representative of the state-of-the-art shared memory systems in use in the industry; and the very recently released Sun Niagara II multicore machine that has eight floating point compute units on the chip. We test a single-phase flow problem on three heterogeneous reservoirs with varying well placements (this system gives the worst case scenario as the tracing and mapping costs are not negligible compared to the transport costs). For the SPARC and Opteron system, we find parallel efficiencies ranging between 60 and 75 for the tracer flow problems. The sublinear speedup is mostly due to communication overheads in the tracing and mapping stages. In applications with more complex physics, the relati...
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. In this article, we study the performance of a parallel PDE solver with adaptive mesh refinement (AMR). The solver is parallelized using OpenMP and the adaptive mesh refinement makes dynamic load balancing necessary. Due to the dynamically changing memory access pattern caused by the runtime adaption, it is a challenging task to achieve a high degree of geographical locality. The main conclusions of the study are: (1) that geographical locality is very important for the performance of the solver, (2) that the performance can be improved significantly using dynamic page migration of misplaced data, (3) that a migrate-onnext-touch directive works well whereas the first-touch strategy is less advantageous for programs exhibiting a dynamically changing memory access patterns, and (4) that the overhead for such migration is low compared to the total execution time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.