Commercial reservoir simulators have traditionally been optimized for distributed parallel execution on Central Processing Units (CPUs). Recent advances in Graphics Processing Units (GPUs) have led to the development of GPU-native simulators and triggered a shift towards a hardware-agnostic design in existing CPU solutions. For the latter, the suite of algorithms and data structures employed for a given computation are implemented for each target device. This results in a hybrid approach, where some simulator components inherently expose enough instruction parallelism or memory bandwidth requirements to warrant running on the GPU, while others are more suitable for the CPU. This paper examines the performance characteristics of a commercial black-oil reservoir simulator, which was recently extended with GPU support.
Each simulation case will distribute load on the various modules in a reservoir simulator differently, depending on the target physical properties and the forecasted data desired. To assess this, the scalability of the simulator is measured in detail using the CPU and GPU, for components where both implementations are available, focusing on time spent during model initialization, property calculation, linearization, solver, field management and reporting. This is done using test cases which stress the simulator across several axes: grid resolution, different petrophysical property distributions, well count and the volume of reported data. The synthetic models which form the basis for these studies were designed to represent realistic reservoir engineering scenarios.
The results show that a static partition between CPU- and GPU-assigned tasks, as employed by default in the simulator, is performant for scenarios where the work dedicated to grid cell properties and linear solution vastly outnumbers the effort spent resolving well or aquifer connections, field management and reporting. This is expected for typical simulation cases. However, when one of the latter aspects becomes dominant, the balance can shift, leading to suboptimal hardware utilization. In conclusion, if performance across all possible inputs is to be maintained, then a fully-CPU-and-GPU-capable simulator is needed, employing a dynamic scheduling strategy, where the runtime data locality, volume and parallelism of the corresponding computations are all considered when determining the target device for each operation.
To the authors’ knowledge, a study on the scalability of a commercial reservoir simulator, across two different hardware architectures, has not previously been conducted to this level of detail. The results on realistic models are presented in the hope that they will contribute to the discussion surrounding the benefits of modern computing hardware for reservoir simulation and help drive deployment and design decisions for existing and future developments in both the commercial and academic spheres.