Massively Parallel ANS Decoding on GPUs

Weißenberger, André; Schmidt, Bertil

doi:10.1145/3337821.3337888

Cited by 9 publications

(4 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Following RIQ, each quantized layer is encoded with the ANS encoder that achieves asymptotically the entropy limit. An efficient implementation of ANS on GPU 3 was demonstrated by Weißenberger & Schmidt (2019), reaching a decoding rate of over 20 GB/s. For reproduction purposes, we provide a Python code of our algorithm 4 which includes both the quantization phase (RIQ) and compression phase (ANS).…”

Section: Resultsmentioning

confidence: 99%

Rotation Invariant Quantization for Model Compression

Kampeas¹,

Nahshan²,

Kremer³

et al. 2023

Preprint

View full text Add to dashboard Cite

Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources. In this study, we investigate the rate-distortion tradeoff for NN model compression. First, we suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model, yielding a different rate at each layer, i.e., mixed-precision quantization. Then, we prove that our rotation-invariant approach is optimal in terms of compression. We rigorously evaluate RIQ and demonstrate its capabilities on various models and tasks. For example, RIQ facilitates ×19.4 and ×52.9 compression ratios on pretrained VGG dense and pruned models, respectively, with < 0.4% accuracy degradation. Code: https://github.com/ehaleva/RIQ.

show abstract

Section: Resultsmentioning

confidence: 99%

Rotation Invariant Quantization for Model Compression

Kampeas¹,

Nahshan²,

Kremer³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In related work, GPUs have been successfully used as a coprocessor to accelerate Burrows-Wheeler transform [6]. There further exist parallel implementations of the Lempel-Ziv-Welch (LZW) [7] and Lempel-Ziv-Storer-Szymanski (LZSS) [17] compressors, and GPU entropy coding has seen notable progress in the form of fast Huffman [2,22] and Asymmetric Numeral System (ANS) coders [24].…”

Section: Data Compression On Gpusmentioning

confidence: 99%

ndzip-gpu

Knorr

Thoman

Fahringer

2021

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Lossless data compression is a promising software approach for reducing the bandwidth requirements of scientific applications on accelerator clusters without introducing approximation errors. Suitable compressors must be able to effectively compact floatingpoint data while saturating the system interconnect to avoid introducing unnecessary latencies.We present ndzip-gpu, a novel, highly-efficient GPU parallelization scheme for the block compressor ndzip, which has recently set a new milestone in CPU floating-point compression speeds.Through the combination of intra-block parallelism and efficient memory access patterns, ndzip-gpu achieves high resource utilization in decorrelating multi-dimensional data via the Integer Lorenzo Transform. We further introduce a novel, efficient warp-cooperative primitive for vertical bit packing, providing a high-throughput data reduction and expansion step.Using a representative set of scientific data, we compare the performance of ndzip-gpu against five other, existing GPU compressors. While observing that effectiveness of any compressor strongly depends on characteristics of the dataset, we demonstrate that ndzip-gpu offers the best average compression ratio for the examined data. On Nvidia Turing, Volta and Ampere hardware, it achieves the highest single-precision throughput by a significant margin while maintaining a favorable trade-off between data reduction and throughput in the double-precision case. CCS CONCEPTS• Computing methodologies → Parallel algorithms; • Theory of computation → Data compression.

show abstract

“…On the other hand, lossless compression methods face challenges with respect to implementation and performance on parallel architectures [35]. This is due to their inherently sequential view of the data, and, while such methods exist [13,54,53,47], they generally do offer neither the random access nor the bandwidth required to interactively render directly from the compressed representation. Aside from raw integer volumes [32], PNG-compressed RGB or RGBα slices storing the segment IDs in multiple 8-bit channels are still a de-facto format for exchanging columns of neuronal tissue [24,7].…”

Section: Related Workmentioning

confidence: 99%