“…-Key technical contribution of this work: A fully end-to-end programming model for AMD AI Engines, including a language frontend, optimal stream routing (using ILP and CP-SAT formulations of congestion-aware traffic assignment (Temelcan et al, 2020)), runtime memory management, and packaging, distribution, deployment to device; a novel stream broadcasting primitive for reducing the semantic gap between array broadcasting and stream switch configuration; a novel approach to metaprogramming MLIR in Python that enables using the same language for both metaprogramming and programming; finally, performant implementations of GEMM for the same architecture.…”