“…In this section, we compare the computational cost of hexahedra, wedges, and pyramids relative to the computational cost of tetrahedra. The results reported are for tuned computational kernels, where the number of elements processed per workgroup has been chosen in order to minimize the runtimes of the volume, surface, and update kernels for each element [34,17]. As suggested in [26], automation of this process is crucial for portable performance across various architectures, especially for hybrid meshes where parameters must be tuned for 12 separate kernels.…”