2022
DOI: 10.1109/tc.2022.3211430
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Synthesis of Dynamically Controlled Machine Learning Accelerators

Abstract: Edge systems are required to autonomously make real-time decisions based on large quantities of input data under strict power, performance, area, and other constraints. Meeting these constraints is only possible by specializing systems through hardware accelerators purposefully built for machine learning and data analysis algorithms. However, data science evolves at a quick pace, and manual design of custom accelerators has high non-recurrent engineering costs: general solutions are needed to automatically and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 43 publications
0
3
0
Order By: Relevance
“…The HERMES project use cases include applications based on artificial intelligence, which might contain multiple parallel execution flows (i.e., coarse-grained parallelism); when synthesized through an HLS tool, the complexity of the finite state machine controllers for such applications grows exponentially, leading to considerable resource consumption and latency overheads. To solve this problem, Bambu has been extended to efficiently synthesize dynamically controlled accelerators and integrated in a compiler-based toolchain that facilitates the extraction of coarse-grained tasks and data dependencies from an input machine learning application designed and trained in a high-level programming framework [14]. Future developments will focus on other optimization techniques and architectural templates that can answer the specific needs of artificial intelligence algorithms and the requirements of aerospace applications.…”
Section: Introductionmentioning
confidence: 99%
“…The HERMES project use cases include applications based on artificial intelligence, which might contain multiple parallel execution flows (i.e., coarse-grained parallelism); when synthesized through an HLS tool, the complexity of the finite state machine controllers for such applications grows exponentially, leading to considerable resource consumption and latency overheads. To solve this problem, Bambu has been extended to efficiently synthesize dynamically controlled accelerators and integrated in a compiler-based toolchain that facilitates the extraction of coarse-grained tasks and data dependencies from an input machine learning application designed and trained in a high-level programming framework [14]. Future developments will focus on other optimization techniques and architectural templates that can answer the specific needs of artificial intelligence algorithms and the requirements of aerospace applications.…”
Section: Introductionmentioning
confidence: 99%
“…Most parallel computing approaches optimize implementation for a single device using a specific implementation framework: for example, Python/HLS for FPGAs [6], Python/CUDA for GPUs [7], or C/HLS for ASICs/FPGAs [8,9]. Ad hoc solutions do not allow changes to the originally defined computing architecture to support algorithms other than those for which they were created.…”
Section: Introductionmentioning
confidence: 99%
“…All of these devices are used simultaneously. This could result in a more flexible computation approach than developing a solution for a specific device using a custom design flow, as presented in [6][7][8][9]; the main advantage of heterogeneous computing lies in its flexibility and generality, possibly at the cost of performance.…”
Section: Introductionmentioning
confidence: 99%