Synchronization operations like barriers are frequently seen in parallel OpenMP programs, where an inefficient implementation can severely limit the application performance. While synchronization optimization has been heavily studied on traditional x86 architectures, there is no consensus on how synchronization can be best implemented on the ARMv8 multicore CPUs. This paper presents a study of OpenMP synchronization implementation on two representative ARMv8 multi-core architectures, Phytium 2000+ and ThunderX2, by considering various OpenMP synchronization mechanisms offered by two mainstreamed OpenMP compilers, GCC and LLVM. Our evaluation compares the performance, overhead and scalability of both compiler implementations. We show that there is no "one-fits-forall" synchronization mechanism, and the efficiency of a scheme varies across hardware architectures and thread parallelism. We then share our insights and discuss how OpenMP synchronization operations can be better optimized on emerging ARMv8 multicores, offering quantified results for future research directions.