We present a directive-based programming framework, i.e., the pipelined accelerator (PACC), to accelerate large-scale stencil computation on an accelerator device, such as a graphics processing unit (GPU). PACC provides a collection of extended OpenACC directives to facilitate out-of-core stencil computation accelerated using temporal blocking. The proposed framework includes a source-to-source translator capable of generating an out-of-core OpenACC code from the PACC code, i.e., large data is automatically decomposed into smaller chunks that are processed using limited capacity device memory. The generated code is optimised using a temporal blocking technique to minimise CPU-GPU data transfer. Furthermore, the code is accelerated using a multithreaded pipeline engine that maximises data copy throughput and overlaps GPU execution and data transfer. In experiments, we applied the proposed translator to three stencil computation codes. The out-of-core performance for 107 GB data on an NVIDIA Tesla K40 GPU with 12 GB memory reached 69.3 GFLOPS, which is 17% less than the in-core performance for 8 GB data. We believe that the proposed directive-based approach can be used to facilitate out-of-core stencil computation on a GPU.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.