Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture 2017
DOI: 10.1145/3123939.3123982
|View full text |Cite
|
Sign up to set email alerts
|

Bit-pragmatic deep neural network computing

Abstract: Abstract-We quantify a source of ineffectual computations when processing the multiplications of the convolutional layers in Deep Neural Networks (DNNs) and propose Pragmatic (PRA), an architecture that exploits it improving performance and energy efficiency. The source of these ineffectual computations is best understood in the context of conventional multipliers which generate internally multiple terms, that is, products of the multiplicand and powers of two, which added together produce the final product [1… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
86
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 203 publications
(87 citation statements)
references
References 24 publications
1
86
0
Order By: Relevance
“…We use two baselines: the 1 st one is DaDianNao [9], the de facto design to report relative performance of diverse DCNN accelerators; the 2 nd one is a state-of-the-art bit-serial implementation [5] (PRA), it is also designed for computing essential bits of the activations, and we enroll its fp16 design on weights for fair comparison. We implement Tetris with two configurable modes, fp16 and int8, as mentioned in Section III.…”
Section: Discussionmentioning
confidence: 99%
“…We use two baselines: the 1 st one is DaDianNao [9], the de facto design to report relative performance of diverse DCNN accelerators; the 2 nd one is a state-of-the-art bit-serial implementation [5] (PRA), it is also designed for computing essential bits of the activations, and we enroll its fp16 design on weights for fair comparison. We implement Tetris with two configurable modes, fp16 and int8, as mentioned in Section III.…”
Section: Discussionmentioning
confidence: 99%
“…This will significantly speed up the convolution operations [16]. (ii) A network quantized to fixed-point requires specialized integer arithmetic units (with various bitwidth) for efficient computing [1,18], whereas a network quantized with multiple binary bases adopts the same operations mentioned before as binary networks. Popular networks quantized with binary bases include Binary Networks and Multi-bit Networks.…”
Section: Related Workmentioning
confidence: 99%
“…2) Simulation: Some prior work has also simulated machine learning workloads, but these papers used private simulators [33]- [36]. Since these simulators are not publicly available and few details are available, it is difficult to compare their approaches to ours.…”
Section: A Machine Learning Frameworkmentioning
confidence: 99%