“…ALT addresses the two limitations via 1) the generic layout transformation submodule, which requires no re-implementation, and is also independent of the loop transformation to achieve the decoupling; 2) an autotuning module at a higher level to orchestrate the cross-layer joint tuning while guaranteeing efficiency. As for recent loop optimization techniques [2,3,5,21,42,65,66,73,78,80,85,[89][90][91], such as delicate cost models [3,5,42,73], aggressive operator fusion [21,40,46,50,80,90], and micro-kernel construction [91], they are complementary to ALT.…”