Enlighten -Research publications by members of the University of Glasgow http://eprints.gla.ac.uk 404 and then for the data associated with the tile of argument 0. 405 Furthermore, send_dim and send_idx can be used to send 406 tile dimensions or tile indices, which could be used to drive 407 more complex accelerators. Subsequent text will refer to an 408 opcode entry, such as "sA", simply as opcode. 409 -opcode_flow: represents valid opcode/data transfer flows 410 and respects the syntax scheme shown in Figure 8. Figure 6a-411 L23 shows an example, which defines an input A stationary 412 (associated with argument 0) valid flow implemented with 413 two opcodes, using the identifiers defined in the opcode_map. 414 Additional valid examples for output C stationary and nothing 415 stationary flows are shown in lines 24 and 25 of Figure 6a. 416 The information in opcode_flow is parsed and the set of 417 parentheses is understood as a proxy to specify multiple scopes 418 for sequential or nested for loops in the algorithm. Following 419 this flow, logic related to "sA" would be transmitted inside of 420 the second loop (Figure 6b-L8 to L10), and logic related to 421 "sBcCrC" would appear in the innermost loop (Figure 6b-L12 422 to L18). Suppose the user decides to forego the opportunity 423 to specify input A as stationary, then the opcode flow could 424 become "(sA sB cC rC)", and all communication driver logic 425 would be generated in the innermost loop. 426 The accel dialect: Before generating function calls for 427 runtime replacement to the DMA runtime library (described 428 in Section III-A), we perform host code transformations 5 429 (Figure 4) by lowering the linalg.generic operation, with 430 the proposed trait, to standard MLIR dialects (scf, arith, 431 memref) and a new dialect that we call accel. Operations in 432 the accel dialect abstract host-accelerator transactions, such 433 as initialization, memory transfers, and synchronization. Fig-434 ure 9 presents the core accel operations and their semantics, 435 providing examples of how these operations map onto our 436 custom AXI DMA library calls. Additionally, Figure 6b shows 437 how the accel operations are used in our MatMul example. 438 Note that it is easier to perform analysis and transformations 439 HeteroFlow [44], an FPGA accelerator programming model, 771 decouples algorithm specification from data placement op-772 timization using a new primitive ".to()". This approach 773 exposes data placement specification at various granularities, 774 achieving efficient code generation while matching optimized 775 manual HLS designs. HeteroFlow does not support arbitrary 776 custom accelerators, as it is limited to accelerators co-designed 777 with its framework (extended HeteroCL [45]). It also requires 778 the new primitive to be used while describing the algorithm 779 in Python, imposing manual application modification. Unlike 780 HeteroFlow, AXI4MLIR utilizes MLIR to target languages 781 employing linalg.generic operations during compilation, 782 elimin...