2021
DOI: 10.1145/3460776
|View full text |Cite
|
Sign up to set email alerts
|

Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks

Abstract: The systolic array architecture is one of the most popular choices for convolutional neural network hardware accelerators. The biggest advantage of the systolic array architecture is its simple and efficient design principle. Without complicated control and dataflow, hardware accelerators with the systolic array can calculate traditional convolution very efficiently. However, this advantage also brings new challenges to the systolic array. When computing special types of convolution, such as the small-scale co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…In order to improve the efficiency of CNNs, different hardware paradigms have been developed with the aim of massively reducing the volume of information movement. Numerous dataflow architectures have been proposed [ 8 , 9 , 10 , 11 , 12 , 13 , 14 ], whereby an array of processing elements (PEs), each containing their own limited memory called a register file, minimise the flow of data by storing intermediate results locally as well as operating on data received from their neighbouring PEs. Similarly, tensor processing units have been proposed which accelerate matrix multiplication through the cascading of multiplication and sum over a systolic array of PEs [ 15 ].…”
Section: Introductionmentioning
confidence: 99%
“…In order to improve the efficiency of CNNs, different hardware paradigms have been developed with the aim of massively reducing the volume of information movement. Numerous dataflow architectures have been proposed [ 8 , 9 , 10 , 11 , 12 , 13 , 14 ], whereby an array of processing elements (PEs), each containing their own limited memory called a register file, minimise the flow of data by storing intermediate results locally as well as operating on data received from their neighbouring PEs. Similarly, tensor processing units have been proposed which accelerate matrix multiplication through the cascading of multiplication and sum over a systolic array of PEs [ 15 ].…”
Section: Introductionmentioning
confidence: 99%
“…To increase applicability, This work was supported by a research grant of Siemens EDA to Democritus University of Thrace for "HLS Research for Systems-on-Chip". flexibility, and PE utilization, configurable SAs can support various dataflow types [8], [9], [7].…”
Section: Introductionmentioning
confidence: 99%
“…Unlike the state-of-the-art works, where the standard algorithm for convolution is modified to improve the performance, the proposed convolver can receive raw feature maps and weights to deliver an output whose precision is only limited by the number of bits in data representation. The convolution operation with systolic arrays is usually conducted as a matrices multiplication in related works [42], which requires a previous processing stage, and irregular data memory access [10]. Tiles-based data processing is often used too, but a data processing stage after the convolution computing is implied to compensate for the quantization error due to not being able to apply the filters on the edges of the tiles [37].…”
Section: Introductionmentioning
confidence: 99%