2019
DOI: 10.1145/3359983
|View full text |Cite
|
Sign up to set email alerts
|

Unrolling Ternary Neural Networks

Abstract: The computational complexity of neural networks for large scale or real-time applications necessitates hardware acceleration. Most approaches assume that the network architecture and parameters are unknown at design time, permitting usage in a large number of applications. This paper demonstrates, for the case where the neural network architecture and ternary weight values are known a priori, that extremely high throughput implementations of neural network inference can be made by customising the datapath and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…After investigating through the recent VGG networks, we propose a similar VGG-style base network (VSBN) model for our task. Note that VGG-style are popularly used in many recent networks, such as References [19,20]. 2a), After the 1 st convolution block (CB), which consists of two repetitions of 2 convolutional layers with 64 kernels with sizes of 3 × 3, abbreviate as 2x(64 3x3), and one max pooling layers, the output is 112 × 112 × 64 (See S2 in Figure 2a).…”
Section: B Vgg-style Base Networkmentioning
confidence: 99%
“…After investigating through the recent VGG networks, we propose a similar VGG-style base network (VSBN) model for our task. Note that VGG-style are popularly used in many recent networks, such as References [19,20]. 2a), After the 1 st convolution block (CB), which consists of two repetitions of 2 convolutional layers with 64 kernels with sizes of 3 × 3, abbreviate as 2x(64 3x3), and one max pooling layers, the output is 112 × 112 × 64 (See S2 in Figure 2a).…”
Section: B Vgg-style Base Networkmentioning
confidence: 99%
“…This work splits the whole GNN into several sub-layers and adopts a layer-wise hardware architecture [12,2,13,16] to map all the sub-layers on-chip which is flexible and able to take full advantage of the customizability of FPGAs. In addition, we perform the calculation for different sublayers on their own units using dedicated optimization to achieve low latency and high design throughput.…”
Section: Implementation Of the Hardware Acceleratormentioning
confidence: 99%
“…This work proposes a custom Low Latency (LL)-GNN hardware architecture based on a layer-wise tailor-made pipeline to accelerate the GNNs for particle detectors, using the GNN-based JEDI-net algorithm as an end-to-end application. The layer-wise architecture has been used to speedup CNNs [12,13,14,15,16] and RNNs [17], but few studies focus on accelerating GNNs. First, we propose custom strength reduction for matrix operations based on the character of interaction-network based GNNs with a fully connected graph as an input, which avoids the expensive matrix multiplications of the adjacency matrix with the input feature matrix.…”
Section: Introductionmentioning
confidence: 99%
“…However, these methods only quantize network weights into three values and network outputs are still kept as real-valued variables. Recent studies on hardware implementation of ternary neural network [1,24,20] also shown that it is possible to have an efficient and fast ternary-based computation on Field-Programmable Gate Array (FPGA) [3].…”
Section: Ternary Quantization and Hardwarementioning
confidence: 99%