2020 IEEE International Solid- State Circuits Conference - (ISSCC) 2020
DOI: 10.1109/isscc19947.2020.9062989
|View full text |Cite
|
Sign up to set email alerts
|

7.4 GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-Sparsity Exploitation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 58 publications
(23 citation statements)
references
References 2 publications
0
22
0
Order By: Relevance
“…We evaluate the performance of EILE by training a fully-connected network with 2 hidden layers (network size: 784-512-256-10, total 1 MB of parameters) on the full MNIST [13] handwritten digit dataset. Activations are quantized to fixed-point Q (8,8) format while weights and gradients are quantized to Q (2,14) format for batch size of 1, where Q(m, n) denotes the quantization using m bits for the integer part and n bits for the fraction.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluate the performance of EILE by training a fully-connected network with 2 hidden layers (network size: 784-512-256-10, total 1 MB of parameters) on the full MNIST [13] handwritten digit dataset. Activations are quantized to fixed-point Q (8,8) format while weights and gradients are quantized to Q (2,14) format for batch size of 1, where Q(m, n) denotes the quantization using m bits for the integer part and n bits for the fraction.…”
Section: Resultsmentioning
confidence: 99%
“…To support different dataflows in FP and BP, [11] proposed a transposable PE array to exploit parallelism on multiple samples in a batch, thus PE utilization decreases with smaller batch sizes. [14] did not show PE utilization number and its normalized throughput cannot be calculated. [15] proposed a 2D PE array that exploits activation sparsity in FP and BP and reports lower FP/BP utilization but batch size is not mentioned.…”
Section: Comparison With Other Workmentioning
confidence: 99%
“…This approach is well-established both in the floatingpoint and integer domains. For what concerns floating-point workloads, transprecision techniques have been demonstrated in domains such as traditional near-sensor data analytics [3] and training of neural networks [4]. In the integer domain, emerging fixed-point transprecision and mixedprecision techniques can be pushed down even more significantly to extreme low-bitwidth for applications based on linear algebra [5] and inference of deep neural networks [6].…”
Section: Introductionmentioning
confidence: 99%
“…However, they underperform the previous perception-based processors [5,6] which accelerates the lowbit precision of image classification and detection algorithms. Furthermore, CNN used in image-to-image applications usually do not use the ReLU activation function so the previous zero-skip processors [7] cannot achieve high performance.…”
Section: Introductionmentioning
confidence: 99%