Modulo scheduling is the premier technique for throughput maximization of loops in high-level synthesis by interleaving consecutive loop iterations. The number of clock cycles between data insertions is called initiation interval (II). For throughput maximization, this value should be as low as possible; therefore its minimization is the main optimization goal.
Despite its long historical existence, modulo scheduling always remained a relevant research topic over the last years with many exact and heuristic algorithms available in literature.
Nevertheless, we are able to leverage the scalability of modern Boolean Satisfiability (SAT) solvers to outperform state-of-the-art ILP-based algorithms for latency-optimal modulo scheduling for both integer and rational IIs. Our algorithm is able to compute valid modulo schedules for the whole CHStone and MachSuite benchmark suites, with 99% of the solutions being proven to be throughput-optimal for a timeout of only 10 min per candidate II. For various time limits, not a single tested scheduler from the state-of-the-art is able to compute more verified optimal solutions or even a single schedule with a higher throughput than our proposed approach. Using an HLS toolflow we show that our algorithm can be effectively used to generate Pareto-optimal FPGA implementations regarding throughput and resource usage.