“…Loop-optimizations unrolling, 9,17,23,24,29,50,84,90 collapsing, 4,6,7,13,20,21,44,54 splitting 22,28 Blocking (tiling) in cache, 14,15,18,[20][21][22]27,39,44,52,54,69 registers 68,69 Compile-time optimizations using pre-computed values, 35,52 specifying array and loop bounds at compile time 6,54 Compute-related optimizations Reusing intermediate variables, 22,35 using conflict-detection instruction of AVX-512, 52,85 performing redundant computation to avoid data-communication or atomic operations 52,82 Array transpose 6, 79…”