“…• Challenge 3 -How to design a general-purpose accelerator which does not need to be rerun the time-consuming flow of synthesis/place/route. While many accelerators have been designed for boosting computing performance and efficiency in many application domains such as deep learning [5, 11, 12, 23, 31, 35, 64-69, 77, 87, 88], dense linear algebra [23,29,30,35,77], graph processing [4,17,25,26,39,70,89,91,92,95], genomic and bio analysis [8,9,13,14,33,38,51,76,81], data sorting [10,52,60,63], most are designed for one specific problem with fixed input and output size. For FPGA accelerators even with improved tools such as [17,77], a new design will still consume many hours or even a few days due to long synthesis and place/route time.…”