An important goal of structured parallel programming has been to provide a design framework that balances between the extent of abstraction built over the hardware and the amount of control given to the programmer to leverage the hardware resource features. Towards this goal, NVIDIAhas released an open-source design framework called Thrust based on C++ STL, where the developers can express the functionality in STL style, without having to know the architectural details of the underlying parallel infrastructure. While the framework is generic and portable, it does not support the right abstraction for two-dimensional data, which is heavily used in most of the popular parallel algorithms. In this paper, we proposed Thrust2D, an extension of Thrust to support the abstraction for two-dimensional data, targeted towards structured grid class of applications.We took several structured grid examples from Rodinia benchmark, OpenCV framework, and NVIDIA samples and rewrote them using Thrust2D. We demonstrated that, in some cases, we get nearly 80% reduction in code complexity, and for 12 out of 17 applications we have tested, the kernel performance of Thrust2D versions are well within 85% of the native CUDA versions. When we consider the total execution time, 14 out of 17 Thrust2D versions performance are within 85% of the native CUDA versions. In some cases, the performance of the Thrust2D versions has outperformed the native versions. KEYWORDS algorithmic skeleton, cyclomatic complexity, dwarf, GPU, HPC, relative performance, shared memory access, structured grid
INTRODUCTIONA high-performance software should be written with extreme care to get optimum performance from the underlying hardware. Writing such software has traditionally been (and remains) difficult for software developers due to a range of complications related to the task (to be executed in a heterogeneous infrastructure) decomposition, data alignment, communication, synchronization, debugging, and so on. Since the HPC infrastructure is evolving rapidly with new capabilities, any infrastructure upgrade invariably requires the software to be tuned (and possibly rewritten) to get maximum performance from the upgraded hardware. Moreover, for commercial reasons, an application might have to run on widely different hardware simultaneously. One solution to this is to create a software abstraction for application developers, which hides the architectural details, data access complexity, and communication details of the underlying hardware as much as possible. Such an abstraction should provide portability across various infrastructures. Lastly, the abstraction framework should provide an appropriate mechanism to the developer to express the intention to exploit the infrastructure specific features in the code so that the application can optimally utilize the computing capability of the hardware and deliver the performance close to the native implementation. NVIDIA Inc has taken the initiative to develop a lightweight framework based on open-source STL called Thrus...