ΔNN: Power-efficient Neural Network Acceleration using Differential Weights

Mahdiani, Hoda; Khadem, Alireza; Ghanbari, Azam; Modarressi, Mehdi; Fattahi-Bayat, Farima; Daneshtalab, Masoud

doi:10.1109/mm.2019.2948345

Cited by 4 publications

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Weights [57], [61], [133], [153], [241] Activations [134], [237] Computation reuse and memoization Partial [133], [134], [161], [237], [239], [241], [242] Full [188], [238], [240] Computation reduction with early termination [243]- [246] instance, configurable communication network allows PEs to execute in dataflow fashion; PEs can request for partially refilling their buffers with the new data. EyerissV2 [63] proposed a hierarchical mesh interconnect with configurable router nodes which allow configuring the router for communicating the data between the source (e.g., shared memory) and destination (e.g., PEs) ports via broadcast/multicast/unicast.…”

Section: Value Similaritymentioning

confidence: 99%

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Dave,

Baghdadi,

Nowatzki

et al. 2020

Preprint

View full text Add to dashboard Cite

Machine learning (ML) models are widely used in many domains including media processing and generation, computer vision, medical diagnosis, embedded systems, high-performance and scientific computing, and recommendation systems. For efficiently processing these computationaland memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular-shaped computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This paper provides a comprehensive survey on how to efficiently execute sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses additional enhancement modules in architecture design and software support; categorizes different hardware designs and acceleration techniques and analyzes them in terms of hardware and execution costs; highlights further opportunities in terms of hardware/software/algorithm co-design optimizations and joint optimizations among described hardware and software enhancement modules. The takeaways from this paper include: understanding the key challenges in accelerating sparse, irregular-shaped, and quantized tensors; understanding enhancements in acceleration systems for supporting their efficient computations; analyzing trade-offs in opting for a specific type of design enhancement; understanding how to map and compile models with sparse tensors on the accelerators; understanding recent design trends for efficient accelerations and further opportunities.

show abstract