2016
DOI: 10.1002/cpe.4022
|View full text |Cite
|
Sign up to set email alerts
|

C2CU: a CUDA C program generator for bulk execution of a sequential algorithm

Abstract: Summary Several important tasks, including matrix computation, signal processing, sorting, dynamic programming, encryption, and decryption, can be performed by oblivious sequential algorithms. A sequential algorithm is oblivious if an address accessed at each time does not depend on the input data. A bulk execution of a sequential algorithm is to execute it for many independent inputs in turn or in parallel. A number of works have been devoted to design and implement parallel algorithms for a single input. How… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
6

Relationship

5
1

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 28 publications
0
8
0
Order By: Relevance
“…However, any kind of problems cannot be applied. The condition of problems that can be applied is (i) the sequential algorithm of the problem is oblivious, 34 and (ii) the execution of the algorithm can be divided into small tasks, where a sequential algorithm is oblivious if an address accessed at each time does not depend on the input data. Problems of which the sequential algorithms are oblivious include not only dynamic programming but also matrix computation, signal processing, sorting, dynamic programming, encryption, and decryption.…”
Section: Discussionmentioning
confidence: 99%
“…However, any kind of problems cannot be applied. The condition of problems that can be applied is (i) the sequential algorithm of the problem is oblivious, 34 and (ii) the execution of the algorithm can be divided into small tasks, where a sequential algorithm is oblivious if an address accessed at each time does not depend on the input data. Problems of which the sequential algorithms are oblivious include not only dynamic programming but also matrix computation, signal processing, sorting, dynamic programming, encryption, and decryption.…”
Section: Discussionmentioning
confidence: 99%
“…We have implemented the single thread implementation such that each thread computes one multiplication. This implementation is based on the idea proposed in [16]. In the implementation, there is no warp divergence since all threads execute the same instructions, that is, this implementation is also based on warp-synchronous programming technique.…”
Section: Resultsmentioning
confidence: 99%
“…However, there is no research that is premised on the bulk execution. On the other hand, to accelerate the bulk execution of multiple-length multiplication, our proposed method uses the idea in [16] that shows a technique of more efficient GPU implementations for the bulk execution by considering the GPU architecture.…”
Section: Introductionmentioning
confidence: 99%
“…CUDA gives developers access to the virtual instruction set and memory of the parallel computational elements in NVIDIA GPUs. In many cases, GPUs are more efficient than multicore processors, since they have hundreds of processor cores and very high memory bandwidth 3‐5 …”
Section: Introductionmentioning
confidence: 99%