2017
DOI: 10.15803/ijnc.7.2_208
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Algorithms for Stream Compaction on GPUs

Abstract: Stream compaction, also known as stream filtering or selection, produces a smaller output array which contains the indices of the only wanted elements from the input array for further processing. With the tremendous amount of data elements to be filtered, the performance of selection is of great concern. Recently, modern Graphics Processing Units (GPUs) have been increasingly used to accelerate the execution of massively large, data parallel applications. In this paper, we designed and implemented two new algo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 6 publications
0
8
0
Order By: Relevance
“…We apply V-Quant and RV-Quant to training to minimize memory cost. During training, in order to compress the sparse large activations on GPU, we use the existing work in [1]. In order to obtain quantized networks for inference, we perform fine-tuning with V-Quant for a small number of additional epochs, e.g., 1-3 epochs after total 90 epochs of original training.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We apply V-Quant and RV-Quant to training to minimize memory cost. During training, in order to compress the sparse large activations on GPU, we use the existing work in [1]. In order to obtain quantized networks for inference, we perform fine-tuning with V-Quant for a small number of additional epochs, e.g., 1-3 epochs after total 90 epochs of original training.…”
Section: Methodsmentioning
confidence: 99%
“…Compared with the existing methods of low memory cost in training [2] [5], our proposed method reduces computation cost by avoiding re-computation during back-propagation. More importantly, our proposed method has a potential of further reduction in computation cost especially in Equation (1). It is because the activation y i is mostly in low precision in our method.…”
Section: Potential Of Further Reduction In Computation Costmentioning
confidence: 99%
See 1 more Smart Citation
“…8). Conceptually, this is an application of stream compaction [8] and usually implemented with a prefix sum [56, 13]: Given a bitmap of size M , generate an indices array of size M containing i at position i if the i-th bit is set. Otherwise, store an invalid marker.…”
Section: Number Of Assigned Blocks)mentioning
confidence: 99%
“…This extra storage cost can be further compressed by exploiting the non-uniform distribution of values[1,43] 6. Applying PWLQ on both weights and activations is discussed in the supplementary material 7.…”
mentioning
confidence: 99%