Efficient Sparse Neural Networks Using Regularized Multi Block Sparsity Pattern on a GPU

Vooturi, Dharma Teja; Kothapalli, Kishore

doi:10.1109/hipc.2019.00035

Cited by 4 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The authors in [63] proposed a generic sparsity pattern termed Regularized Multi Block (RMB) sparsity pattern, an efficient storage format (CRMB), and a fast GPU algorithm for processing the RMBMM (SDMM with the multiplicand having the RMB sparsity pattern). Figure 9 shows the CRMB storage format for storing an RMB sparse matric.…”

Section: Architecture/platform/framework and Strategymentioning

confidence: 99%

GPU-Based Embedded Intelligence Architectures and Applications

Ang

Seng

2021

Electronics

View full text Add to dashboard Cite

This paper present contributions to the state-of-the art for graphics processing unit (GPU-based) embedded intelligence (EI) research for architectures and applications. This paper gives a comprehensive review and representative studies of the emerging and current paradigms for GPU-based EI with the focus on the architecture, technologies and applications: (1) First, the overview and classifications of GPU-based EI research are presented to give the full spectrum in this area that also serves as a concise summary of the scope of the paper; (2) Second, various architecture technologies for GPU-based deep learning techniques and applications are discussed in detail; and (3) Third, various architecture technologies for machine learning techniques and applications are discussed. This paper aims to give useful insights for the research area and motivate researchers towards the development of GPU-based EI for practical deployment and applications.

show abstract

Section: Architecture/platform/framework and Strategymentioning

confidence: 99%

GPU-Based Embedded Intelligence Architectures and Applications

Ang

Seng

2021

Electronics

View full text Add to dashboard Cite

show abstract

“…The idea of pruning was revived by Han et al 3,4 by simply pruning weights based on their magnitude. To improve run time performance on dense AI hardware, structured pruning methods [5][6][7][8][9][10][11][12] are proposed with various structured sparsity patterns like filter, channel, block, and multiblock.…”

Section: Related Workmentioning

confidence: 99%

“…But the main issue with element pruning is that the generated sparse neural networks have irregular compute and memory access patterns due to unstructured sparsity pattern, and thus cannot be efficiently mapped onto dense AI hardware. Structured pruning methods 5‐12 are proposed to improve the run time performance of sparse neural networks. Unlike element pruning, where parameters are removed at an individual level, in structured pruning, parameters are first divided into structural units like filter, channel, block, multiblock, and so on and then are removed at a unit level based on the strength of the unit.…”

Section: Introductionmentioning

confidence: 99%

Ramanujan bipartite graph products for efficient block sparse neural networks

Vooturi

Varma

Kothapalli

2021

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Summary Sparse neural networks are shown to give accurate predictions competitive to denser versions, while also minimizing the number of arithmetic operations performed. However current GPU hardware can only exploit structured sparsity patterns for better efficiency. We propose a framework for generating structured multilevel block sparse neural networks by using the theory of graph products. Our Ramanujan bipartite graph product (RBGP) framework uses products of Ramanujan graphs to obtain the best connectivity for a given level of sparsity. This essentially ensures that the i.) the networks has the structured block sparsity for which runtime efficient algorithms exists, ii.) the model gives high prediction accuracy, due to the better expressive power derived from the connectivity of the graph, iii.) the graph data structure has a succinct representation that can be stored efficiently in memory. We use our framework to design a specific connectivity pattern called RBGP4 which makes efficient use of the memory hierarchy available on GPU. We benchmark our approach on image classification and machine translation tasks with an edge (Jetson Nano 2GB) as well as server (V100) GPUs. When compared with commonly used sparsity patterns like unstructured and block, we obtain significant speedups while achieving the same level of accuracy.

show abstract