“…#map3 = affine_map<(d0, d1, d2) -> (d0, d2)> #map4 = affine_map<(d0, d1, d2) -> (d2, d1)> #map5 = affine_map<(d0, d1, d2) -> (d0, d1)> %3 = linalg.generic { indexing_maps = [#map3, #map4, #map5], iterator_types = ["parallel", "parallel", "reduction"] } ins(%arg0, %1 : tensor<32x32xf32>, tensor<32x32xf32>) outs(%2 : tensor<32x32xf32>) { ^bb0(%in: f32, %in_2: f32, %out: f32): %5 = arith.mulf %in, %in_2 : f32 %6 = arith.addf %out, %5 : f32 linalg.yield %6 : f32 } -> tensor<32x32xf32> opment time. In MLIR, the partial resolution to this problem is called structured code generation (Vasilache et al, 2022), i.e., high-level operations such as torch.nn. Linear(32,32) are first lowered to a structured representation, such as linalg.generic (see Listing 15), which is itself transformed and lowered to optimized loop-nests (see Listing 16).…”