2019 IEEE International Solid- State Circuits Conference - (ISSCC) 2019
DOI: 10.1109/isscc.2019.8662476
|View full text |Cite
|
Sign up to set email alerts
|

7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
31
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 89 publications
(31 citation statements)
references
References 4 publications
0
31
0
Order By: Relevance
“…The NPU of the Exynos 9820 supports only quantized inference and consists of the controller and two cores ( Fig. 2) having 1024 multiply-accumulate (MAC) units [78]. The NPU controller has a CPU, a direct memory access (DMA) unit, code SRAM and a network controller.…”
Section: Samsung Chipsets / Eden Sdkmentioning
confidence: 99%
See 1 more Smart Citation
“…The NPU of the Exynos 9820 supports only quantized inference and consists of the controller and two cores ( Fig. 2) having 1024 multiply-accumulate (MAC) units [78]. The NPU controller has a CPU, a direct memory access (DMA) unit, code SRAM and a network controller.…”
Section: Samsung Chipsets / Eden Sdkmentioning
confidence: 99%
“…When running the computations, the NPU can also skip weights that are zero to improve convolution efficiency. A much more detailed description of the Exynos NPU can be found in [78]. We strongly recommend reading this article for everyone interested in the general functioning of NPUs as it provides an excellent overview on all network / data processing stages and possible bottlenecks.…”
Section: Samsung Chipsets / Eden Sdkmentioning
confidence: 99%
“…In the same vein, to assess the impact of precision quantization across both the accuracy and robustness of the developed audio models, we investigated the performance of each model across four evaluation setups under different bitwidths in Figure 9(a) for = 0.001. For our experiments, we employed three reduced-precision representations using the quantization method discussed in §2.3.2: 16-bit (INT16) fixed-point which has been extensively studied to yield similar accuracy to the FP32 baseline [11,35], 8-bit (INT8) and 4-bit (INT4) fixed-point, which are suitable for hardware platforms with native support for low-precision arithmetic, such as the INT8-enabled mobile deep-learning accelerators by Qualcomm [49], Nvidia [50,51], Arm [52] and Samsung [53], and INT4 on Nvidia's Turing GPUs [54,55]. As shown in Figure 9(a), a bitwidth of 16 bits follows closely the behavior of the original, non-quantized models across all types of attacks.…”
Section: Transferability Of Adversarial Audio Examplesmentioning
confidence: 99%
“…Neural Network (NN) Inference: Hardware accelerators for NN inference are rapidly becoming an integral part of system-on-chips (SoCs). They typically comprise large arrays of multiply-add (MADD) circuits [4] providing a considerable increase in inference speed. In order to meet tighter and tighter throughput constraints, state of the art traditionally trades-off inference speed with accuracy in which the precision of MADD circuits is reduced (i.e., achieve higher frequency) at the cost of some accuracy loss [4], [5].…”
Section: Introductionmentioning
confidence: 99%
“…They typically comprise large arrays of multiply-add (MADD) circuits [4] providing a considerable increase in inference speed. In order to meet tighter and tighter throughput constraints, state of the art traditionally trades-off inference speed with accuracy in which the precision of MADD circuits is reduced (i.e., achieve higher frequency) at the cost of some accuracy loss [4], [5]. However, to satisfy such tight throughput constraints, NN accelerators integrate thousands of MADD units [4] resulting in a significant increase in energy consumption, which might not be tolerated.…”
Section: Introductionmentioning
confidence: 99%