2016
DOI: 10.1016/j.jpdc.2016.03.010
|View full text |Cite
|
Sign up to set email alerts
|

Reducing memory usage by the lifting-based discrete wavelet transform with a unified buffer on a GPU

Abstract: h i g h l i g h t s• We propose a memory saving method for the lifting-based discrete wavelet transform on a GPU. • Our method reduces the memory usage by unifying the input buffer and output buffer. • Our compact data representation interprets a sequence of memory addresses as a circular permutation.• Experimental results on four GPU architectures are presented. • Our unified method is capable of transforming a twice large problem size with a maximum speedup of 3.9. a b s t r a c tIn this study, to improve th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 24 publications
0
5
0
Order By: Relevance
“…In the next step, we will try using graphics processing unit (GPU) to achieve the wavelet transform and solution of sparse representation with low cost and fast process in the proposed algorithm. Similar work has been studied by some scholars [38][39][40][41], but these methods are not quite suitable for our situation. Modern GPUs mostly use the pipeline structure with altitudinal parallelism, which can implement parallel processing on large amount of calculation.…”
Section: Fig 4 Multi-band Joint Local Sparse Tracking Algorithm Via W...mentioning
confidence: 90%
“…In the next step, we will try using graphics processing unit (GPU) to achieve the wavelet transform and solution of sparse representation with low cost and fast process in the proposed algorithm. Similar work has been studied by some scholars [38][39][40][41], but these methods are not quite suitable for our situation. Modern GPUs mostly use the pipeline structure with altitudinal parallelism, which can implement parallel processing on large amount of calculation.…”
Section: Fig 4 Multi-band Joint Local Sparse Tracking Algorithm Via W...mentioning
confidence: 90%
“…However, bit truncation degrades the quality of the reconstructed image when the inverse DWT is applied. ere have been some DWT implementations on graphics processing units (GPUs) [72][73][74][75][76][77][78]; however, GPUs are relatively expensive for low-cost sensing platforms.…”
Section: Related Workmentioning
confidence: 99%
“…The GPCF-LWT integrates the devised LBB, improved lifting scheme, the three implementation optimizations, and the improved data -oriented DLB. This analysis evaluates the SURFSTAND that is a surface characterization system employing modern CPU core [10], and two other multi-GPU implementations of RC&LB and the method proposed by Ikuzawa [46] [45]. It should be noted that the pure dataset division method was applied for the two multi-GPU implementations.…”
Section: A Case Study On Online Engineering Surface Filtrationmentioning
confidence: 99%
“…It is worth mentioning that CUDA supports the single float (32-bit, abbreviated as FP32) and the half float (16-bit, FP16) precisions [19,45] [19,46]. This experiment tested both of these two precisions.…”
Section: A Case Study On Online Engineering Surface Filtrationmentioning
confidence: 99%