Nearly 20 years after the birth of general purpose GPU computing, the HPC landscape is now dominated by GPUs. After years of undisputed dominance by NVIDIA, new players have entered the arena in a convincing manner, namely AMD and more recently Intel, whose devices currently power the first two clusters in the Top500 ranking. Unfortunately, code porting is still a major problem, even more so with the presence of different vendors, but at the same time the emergence of simplified standard paradigms suggests an encouraging prospect for developers.
In this work, we analyze the porting and performance of STREAmS, a community code for compressible fluid dynamics, on Intel® Data Center GPU Max 1550 (formerly called Ponte Vecchio or PVC) based architectures. First, we discuss the porting, based on the offload functionality of the OpenMP 5.x paradigm, and in particular using a hybrid directives/APIs approach that fits smoothly into the multi-backend software ecosystem of STREAmS-2. Second, we analyze the performance of the code on two benchmark clusters powered by PVC, including the exascale Aurora cluster. The performance is evaluated at the different levels 1 of parallelism involved, i.e., the intrinsic parallelism of the PVC tile, the inter-tile parallelism within the GPU configuration, between the GPUs within the node, and between the nodes within the cluster. The analysis shows that although the implementation complexity of the OpenMP porting is limited, it is necessary to follow some important guidelines to achieve satisfactory performance. The PVC GPU shows about 40% higher performance than the NVIDIA A100 or AMD MI250X GPUs, which however were released about 3 years earlier. Both intra-node and inter-node scalability show good results. Overall, the introduction of PVC into the GPU computing HPC landscape represents a positive step forward for diversification and competitiveness in the sector.