Proceedings of the 46th International Symposium on Computer Architecture 2019
DOI: 10.1145/3307650.3322255
|View full text |Cite
|
Sign up to set email alerts
|

Laconic deep learning inference acceleration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 83 publications
(28 citation statements)
references
References 55 publications
0
27
0
Order By: Relevance
“…[29] solves the problem of reducing the number of channels in the compression training process by combining or splitting multiple small arrays. [20], [9] Gope et al and Sharife et al [37] deploy low-precision networks and use bit-wise calculations to achieve low-power designs. However, none of these designs consider the emerging special types of convolution, including the small-scale convolution and DWConv.…”
Section: Related Workmentioning
confidence: 99%
“…[29] solves the problem of reducing the number of channels in the compression training process by combining or splitting multiple small arrays. [20], [9] Gope et al and Sharife et al [37] deploy low-precision networks and use bit-wise calculations to achieve low-power designs. However, none of these designs consider the emerging special types of convolution, including the small-scale convolution and DWConv.…”
Section: Related Workmentioning
confidence: 99%
“…Static pruning removes weights in the offline training stage and apply the same compressed model to all samples. Unstructured pruning [9,11,30,37] targets removing individual weights with minimal contribution. The limitation of the unstructured weight pruning is that dedicated hardware and libraries [37] are needed to achieve speedup from the compression.…”
Section: Related Workmentioning
confidence: 99%
“…Unstructured pruning [9,11,30,37] targets removing individual weights with minimal contribution. The limitation of the unstructured weight pruning is that dedicated hardware and libraries [37] are needed to achieve speedup from the compression. Structured pruning is becoming a more practical solution where filters or blocks are ranked and pruned based on a criterion [7,13,15,27,32,34].…”
Section: Related Workmentioning
confidence: 99%
“…The second is call-based data selection, where an organization uses a phone company's caller ID system to identify a caller. 24) "Most multiprocessor scheduling problems are NP, but for deterministic scheduling this is not a major problem. We can use a polynomial algorithm and develop an optimal schedule if the specific problem is not NP-complete, or we can use off-line heuristic search techniques based on classical theory implications.…”
Section: June 1995mentioning
confidence: 99%
“…Tartan combines an inference optimization software stack run on a custom hardware platform and leverages an on-chip compression engine to squeeze out network sparsity, reduce memory traffic, and yield a 40× inference execution acceleration and an order of magnitude power reduction across a wide selection of neural networks. 23,24 The latest development for inference acceleration platforms comes as an offshoot of the optical quantum computing world. Silicon photonics uses silicon as an optical medium to guide photons, much like electronics uses conductors to guide electrical signals.…”
Section: Exciting Newcomersmentioning
confidence: 99%