“…Our work on graph optimization builds on substantial efforts for optimization of computational graphs of tensor operations. Tensor contraction can be optimized via parallelization [22,23,41,49], efficient transposition [51], blocking [10,18,28,43], exploiting symmetry [15,48,49], and sparsity [22,24,32,39,39,47]. For complicated tensor graphs, specialized compilers like XLA [52] and TVM [8] rewrite the computational graph to optimize program execution and memory allocation on dedicated hardware.…”