ZHW: A Numerical CODEC for Big Data Scientific Computation

Barrow, M J; Wu, Zhengtao; Lloyd, Scott; Gokhale, Maya; Patel, Haritosh; Lindström, Peter

doi:10.1109/icfpt56656.2022.9974258

Cited by 4 publications

(1 citation statement)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Datasets. We conduct our evaluation and comparison based on six typical real-world HPC simulation datasets from the Scientific Data Reduction Benchmarks [23]: HACC (cosmology particle simulation) [1], CESM (climate simulation) [40], Hurricane (ISABEL weather simulation) [41], Nyx (cosmology simulation) [42], QMC-PACK (quantum Monte Carlo simulation) [43], and RTM (reverse time migration, seismic imaging for petroleum exploration) [44], which have been widely used in previous compression studies [14,32,[45][46][47][48][49][50][51][52][53]. The details are shown in Table 1.…”

Section: Rtmmentioning

confidence: 99%

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

Zhang¹,

Tian²,

Di³

et al. 2023

Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

View full text Add to dashboard Cite

Today's large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and highratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warplevel optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2× over cuSZ and an average speedup of 37.0× over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3× and an average compression ratio improvement of 2.0× over cuZFP under the same data distortion. CCS CONCEPTS• Theory of computation → Massively parallel algorithms; Data compression.

show abstract