Comparing the behavior of OpenMP Implementations with various Applications on two different Fujitsu A64FX platforms

Michalowicz, Benjamin; Raut, Eric; Kang, Yan; Curtis, Tony; Chapman, Barbara; Oryspayev, Dossay

doi:10.1145/3437359.3465592

Cited by 10 publications

(1 citation statement)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They found that the LLVM compiler generally outperformed on Xeon processors, but the gcc compiler outperformed for a small number of threads. With regard to ARMv8 architecture, Michalowicz et al [16] analyzed the performance of OpenMP applications with different compilers on the A64FX platform. But their evaluation was based on practical applications rather than specific directives.…”

Section: Measurement Of Openmp Overheadmentioning

confidence: 99%

Characterizing OpenMP Synchronization Implementations on ARMv8 Multi-Cores

Wang

Gao

Fang

et al. 2021

2021 IEEE 23rd Int Conf on High Performance Computing &Amp; Communications; 7th Int Conf on Data Science &Amp; Systems; 19th In

View full text Add to dashboard Cite

Synchronization operations like barriers are frequently seen in parallel OpenMP programs, where an inefficient implementation can severely limit the application performance. While synchronization optimization has been heavily studied on traditional x86 architectures, there is no consensus on how synchronization can be best implemented on the ARMv8 multicore CPUs. This paper presents a study of OpenMP synchronization implementation on two representative ARMv8 multi-core architectures, Phytium 2000+ and ThunderX2, by considering various OpenMP synchronization mechanisms offered by two mainstreamed OpenMP compilers, GCC and LLVM. Our evaluation compares the performance, overhead and scalability of both compiler implementations. We show that there is no "one-fits-forall" synchronization mechanism, and the efficiency of a scheme varies across hardware architectures and thread parallelism. We then share our insights and discuss how OpenMP synchronization operations can be better optimized on emerging ARMv8 multicores, offering quantified results for future research directions.

show abstract

Section: Measurement Of Openmp Overheadmentioning

confidence: 99%