Scalpel

Yu, Jiecao; Lukefahr, Andrew; Palframan, David J.; Dasika, Ganesh; Das, Reetuparna; Mahlke, Scott

doi:10.1145/3079856.3080215

Cited by 169 publications

(26 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comparison of unstructured pruning applied to FPGA and GPGPU/CPU is quite challenging, since the latter platforms cannot directly benefit from DL models compression unless a dedicated data structuring scheme is implemented [39,40,41,42]. This scheme is essential to take advantage of sparse vector-matrix and matrix-matrix multiplication operations, which are much more efficient than their dense counterparts, provided the data is prepared properly.…”

Section: Discussionmentioning

confidence: 99%

Mapping Neural Networks to FPGA-Based IoT Devices for Ultra-Low Latency Processing

Wielgosz

Karwatowski

2019

Sensors

View full text Add to dashboard Cite

Internet of things (IoT) infrastructure, fast access to knowledge becomes critical. In some application domains, such as robotics, autonomous driving, predictive maintenance, and anomaly detection, the response time of the system is more critical to ensure Quality of Service than the quality of the answer. In this paper, we propose a methodology, a set of predefined steps to be taken in order to map the models to hardware, especially field programmable gate arrays (FPGAs), with the main focus on latency reduction. Multi-objective covariance matrix adaptation evolution strategy (MO-CMA-ES) was employed along with custom scores for sparsity, bit-width of the representation and quality of the model. Furthermore, we created a framework which enables mapping of neural models to FPGAs. The proposed solution is validated using three case studies and Xilinx Zynq UltraScale+ MPSoC 285 XCZU15EG as a platform. The results show a compression ratio for quantization and pruning in different scenarios with and without retraining procedures. Using our publicly available framework, we achieved 210 ns of latency for a single processing step for a model composed of two long short-term memory (LSTM) and a single dense layer.

show abstract

Section: Discussionmentioning

confidence: 99%

Mapping Neural Networks to FPGA-Based IoT Devices for Ultra-Low Latency Processing

Wielgosz

Karwatowski

2019

Sensors

View full text Add to dashboard Cite

show abstract

“…By further coupling pruning with quantization and efficient coding, in a scheme called Deep Compression, they achieved up to 49x size reduction [26]. However, deploying pruned models on highly-parallel architectures has proven problematic due to storage overhead and irregular memory access patterns of sparse matrix multiplication [65,74].…”

Section: Weight Pruningmentioning

confidence: 99%

“…Sredojevic et al [65] have proposed an algorithmic way of inducing regularity in sparse networks. Yu et al [74] have developed a hardware-aware pruning method called Scalpel, which matches the coarseness of pruning to the parallelism of underlying hardware. Our approach to packing is based on Scalpel, but applied to binarized models and using CPU bitwidth as packing granularity, while also permuting layer inputs to improve packing opportunities.…”

Section: Weight Pruningmentioning

confidence: 99%

“…To achieve the model size of binarized networks, an 8-bit fixed-point network requires very high sparsity particularly due to the indexing required and loses too many connections to sustain accuracy. Methods to reduce indexing overhead for fixed-point networks is proposed in Reference [74], but is mostly limited to packing of two, so the conclusion doesn't change. Under the size constraints of very small microcontrollers, 3PXNet offers better accuracy-size tradeoffs, even when not accounting for the benefits in processing efficiency that our structured pruning method brings.…”

Section: Accuracy and Model Sizementioning

confidence: 99%

“…Exploiting redundancy through sparsity has been studied since the advent of neural networks [20,28], and recent work [27] has shown over 10x compression on popular network models with same accuracy. Although model compression through pruning significantly reduces the required computational complexity, it is hard to efficiently exploit, especially on highly-parallel hardware [74]. Combining binarization and pruning is the next logical step when pushing the limit of model compression [73].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

3PXNet

Romaszkan

Gupta

2020

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

As the adoption of Neural Networks continues to proliferate different classes of applications and systems, edge devices have been left behind. Their strict energy and storage limitations make them unable to cope with the sizes of common network models. While many compression methods such as precision reduction and sparsity have been proposed to alleviate this, they don't go quite far enough. To push size reduction to its absolute limits, we combine binarization with sparsity in Pruned-Permuted-Packed XNOR Networks (3PXNet), which can be efficiently implemented on even the smallest of embedded microcontrollers. 3PXNets can reduce model sizes by up to 38X and reduce runtime by up to 3X compared with already compact conventional binarized implementations with less than 3% accuracy reduction. We have created the first software implementation of sparse-binarized Neural Networks, released as open source library targeting edge devices. Our library is complete with training methodology and model generating scripts, making it easy and fast to deploy.

show abstract

Processing Systems for Deep Learning Inference on Edge Devices

Véstias

2020

Internet of Things

View full text Add to dashboard Cite

Scalpel

Cited by 169 publications

References 22 publications

Mapping Neural Networks to FPGA-Based IoT Devices for Ultra-Low Latency Processing

Mapping Neural Networks to FPGA-Based IoT Devices for Ultra-Low Latency Processing

3PXNet

Processing Systems for Deep Learning Inference on Edge Devices

Contact Info

Product

Resources

About