“…• support for both continuous [49] and discontinuous finite elements on uniform and adaptively refined meshes with hanging nodes and deformed elements, • support for arbitrary polynomial expansions on quadrilateral and hexahedral element shapes as well as tensor product quadrature rules, • minimization of arithmetic operations by using available symmetries, such as the even-odd decomposition [69] and a switch between the collocation derivative (5) for n 1D q ≈ k + 1 quadrature points or an alternative variant based on derivatives of the original polynomials as used in [49] and discussed in [29], • flexible implementation of operations at quadrature points, • vectorization across several elements to optimally use SIMD units (AVX, AVX-512, AltiVec) of modern processors, • applicability to modern multi-core CPUs as well as GPUs [51,57], • data access optimizations such as element-based loops for DG elements [50,56], • and MPI implementation with tight data exchange as well as MPI-only and shared-memory models [43,48,54].…”