2019
DOI: 10.20532/cit.2019.1004579
|View full text |Cite
|
Sign up to set email alerts
|

A Parallelization of Non-Serial Polyadic Dynamic Programming on GPU

Abstract: Parallelization of Non-Serial Polyadic Dynamic Programming (NPDP) on high-throughput manycore architectures, such as NVIDIA GPUs, suffers from load imbalance, i.e. non-optimal mapping between the sub-problems of NPDP and the processing elements of the GPU. NPDP exhibits non-uniformity in the number of subproblems as well as computational complexity across the phases. In NPDP parallelization, phases are computed sequentially whereas subproblems of each phase are computed concurrently. Therefore, it is essential… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 22 publications
0
3
0
Order By: Relevance
“…Shyamala et al [11] accelerated the computation time through C++ high-performance accelerated massive parallel code. More recently, Diwan and Tembhurne [13] designed an adaptive generalized mapping method to parallelize non-serial polyadic dynamic-programming problems that utilize GPUs, for efficient mapping of subproblems onto processing threads in each phase. Biswas and Mukherjee [14] proposed a new memory optimized technique and a versatile technique of utilizing shared memory in blocks of threads to minimize time for accessing dimensions of matrices on GPU architectures.…”
Section: Introductionmentioning
confidence: 99%
“…Shyamala et al [11] accelerated the computation time through C++ high-performance accelerated massive parallel code. More recently, Diwan and Tembhurne [13] designed an adaptive generalized mapping method to parallelize non-serial polyadic dynamic-programming problems that utilize GPUs, for efficient mapping of subproblems onto processing threads in each phase. Biswas and Mukherjee [14] proposed a new memory optimized technique and a versatile technique of utilizing shared memory in blocks of threads to minimize time for accessing dimensions of matrices on GPU architectures.…”
Section: Introductionmentioning
confidence: 99%
“…Shyamala et al 23 accelerated the computation time through C++ high-performance accelerated massive parallel code. More recently, Diwan and Tembhurne 24 designed an adaptive generalized mapping method to parallelize polyadic-nonserial dynamic-programming problems that utilize GPUs, for efficient mapping of subproblems onto processing threads in each phase. On shared-memory architectures, Tan et al 25 proposed a parallel algorithm based on a pipeline to fill the dynamic-programming table by decomposing the calculation operators.…”
mentioning
confidence: 99%
“…{512,1024, 2048, 4096, 8192,12,288,16,384,20,480,24,576,28,672, 32,768, 36,864, 40,960, 45,056, 49,152, 53,248, 57,344, 61,440, 65,536}, and p is the number of processors, with values in the set ‡ https://www.u-picardie.fr/recherche/presentation/plateformes/plateforme-matrics-382844.kjsp {1, 8, 16, 32}. We compare our dynamic-programming solution with Yao's sequential algorithm and our previous solution (seeSection 4.3).…”
mentioning
confidence: 99%