GraphChallenge.org Sparse Deep Neural Network Performance

Kepner, Jeremy; Alford, Simon; Gadepally, Vijay; Jones, Michael; Milechin, Lauren; Reuther, Albert; Robinett, Ryan; Samsi, Sid

doi:10.48550/arxiv.2004.01181

Cited by 2 publications

(5 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Sparse DNN Challenge specifies a collection of large sparse DNNs models [10], [11] that are representative of the latest trends in addressing challenging machine learning tasks. The challenge provides model structure (number of layers and size of layer), and model weights for computing sparse DNN inference on a given input dataset.…”

Section: A Overview Of Sparse Dnn Challengementioning

confidence: 99%

“…2) Steps Involved in Sparse DNN Challenge: Algorithm 1 describes the high-level steps involved in computing sparse DNN inference in the Sparse DNN Challenge [10], [18]. The challenge provides datasets comprising of input data for the neural network, weights for each layer in the network, bias values for each layer and finaly the ground truth to validate if the results are correct while computing inference.…”

Section: A Overview Of Sparse Dnn Challengementioning

confidence: 99%

“…The footprint of each stage is mapped into the buffer array independently. Thus, for block 0,y , yin [8], yin [10], and yin [13] that are accessed in the second stage are mapped into buffer[0], buffer [1], buffer [3], as illustrated in Figure 2(d).…”

Section: Proposed Algorithm Designmentioning

confidence: 99%

“…Sparse DNNs present unique scalablity challenges. Realizing this, in 2019 MIT/IEEE/Amazon proposed the Sparse DNN Challenge as an extension to the Graph Challenge [10]- [14]. The Sparse DNN Challenge is created by leveraging the 1 Our code is open-source at https://github.com/merthidayetoglu/SpDNN Challenge2020 collective knowledge of machine learning, high-performance computing and graph analytics communities on emerging AI systems.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation

Hidayetoğlu¹,

Pearson²,

Mailthody³

et al. 2020

Preprint

View full text Add to dashboard Cite

This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory requirements of many neural networks beyond the capacity of available accelerators. Sparse deep neural networks (SpDNN) have shown promise for reining in the memory footprint of large neural networks. However, there is room for improvement in implementing SpDNN operations on GPUs. This work presents optimized sparse matrix multiplication kernels fused with the ReLU function. The optimized kernels reuse input feature maps from the shared memory and sparse weights from registers. For multi-GPU parallelism, our SpDNN implementation duplicates weights and statically partition the feature maps across GPUs. Results for the challenge benchmarks show that the proposed kernel design and multi-GPU parallelization achieve up to 180 TeraEdges per second inference throughput. These results are up to 4.3× faster for a single GPU and an order of magnitude faster at full scale than those of the champion of the 2019 Sparse Deep Neural Network Graph Challenge for the same generation of NVIDIA V100 GPUs. Using the same implementation 1 , we also show single-GPU throughput on NVIDIA A100 is 2.37× faster than V100.

show abstract

Section: A Overview Of Sparse Dnn Challengementioning

confidence: 99%

Section: A Overview Of Sparse Dnn Challengementioning

confidence: 99%

Section: Proposed Algorithm Designmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation

Hidayetoğlu¹,

Pearson²,

Mailthody³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Neural network pruning and sparsification methods are successfully applied to address the storage and computational challenges of DNNs [19,24,35,42,46,55]. These approaches aim at reducing the amount of memory and computation required to propagate values through the network, typically by removing unimportant connections.…”

Section: Introductionmentioning

confidence: 99%

Partitioning sparse deep neural networks for scalable training and inference

Demirci

Ferhatosmanoğlu

2021

Proceedings of the ACM International Conference on Supercomputing

View full text Add to dashboard Cite

The state-of-the-art deep neural networks (DNNs) have significant computational and data management requirements. The size of both training data and models continue to increase. Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs. The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning. Both the feedforward (inference) and backpropagation steps in stochastic gradient descent (SGD) algorithm for training sparse DNNs involve consecutive sparse matrix-vector multiplications (SpMVs). We first introduce a distributed-memory parallel SpMV-based solution for the SGD algorithm to improve its scalability. The parallelization approach is based on row-wise partitioning of weight matrices that represent neuron connections between consecutive layers. We then propose a novel hypergraph model for partitioning weight matrices to reduce the total communication volume and ensure computational load-balance among processors. Experiments performed on sparse DNNs demonstrate that the proposed solution is highly efficient and scalable. By utilizing the proposed matrix partitioning scheme, the performance of our solution is further improved significantly.

show abstract

GraphChallenge.org Sparse Deep Neural Network Performance

Cited by 2 publications

References 35 publications

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation

Partitioning sparse deep neural networks for scalable training and inference

Contact Info

Product

Resources

About