A Parallelization of Non-Serial Polyadic Dynamic Programming on GPU

Diwan, Tausif; Tembhurne, Jitendra V.

doi:10.20532/cit.2019.1004579

Cited by 3 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Shyamala et al [11] accelerated the computation time through C++ high-performance accelerated massive parallel code. More recently, Diwan and Tembhurne [13] designed an adaptive generalized mapping method to parallelize non-serial polyadic dynamic-programming problems that utilize GPUs, for efficient mapping of subproblems onto processing threads in each phase. Biswas and Mukherjee [14] proposed a new memory optimized technique and a versatile technique of utilizing shared memory in blocks of threads to minimize time for accessing dimensions of matrices on GPU architectures.…”

Section: Introductionmentioning

confidence: 99%

Coarse-grained multicomputer parallel algorithm using the four-splitting technique for the minimum cost parenthesizing problem

Lacmou Zeutouo,

Kengne Tchendji,

Myoupo

2023

Revue Africaine De Recherche en Informatique Et Mathématiques Appliquées

View full text Add to dashboard Cite

Dynamic programming is a technique widely used to solve several combinatory optimization problems. A well-known example is the minimum cost parenthesizing problem (MPP), which is usually used to represent a class of non-serial polyadic dynamic-programming problems. These problems are characterized by a strong dependency between subproblems. This paper outlines a coarse-grained multicomputer parallel solution using the four-splitting technique to solve the MPP. It is a partitioning technique consisting of subdividing the dependency graph into subgraphs (or blocks) of variable size and splitting large-size blocks into four subblocks to avoid communication overhead caused by a similar partitioning technique in the literature. Our solution consists in evaluating a block by computing and communicating each subblock of this block to reduce the latency time of processors which accounts for most of the global communication time. It requires O(n^3/p) execution time with O(k * \sqrt{p}) communication rounds. n is the input data size, p is the number of processors, and k is the number of times the size of blocks is subdivided.

show abstract

Section: Introductionmentioning

confidence: 99%

Coarse-grained multicomputer parallel algorithm using the four-splitting technique for the minimum cost parenthesizing problem

Lacmou Zeutouo,

Kengne Tchendji,

Myoupo

2023

Revue Africaine De Recherche en Informatique Et Mathématiques Appliquées

View full text Add to dashboard Cite

show abstract

“…Shyamala et al 23 accelerated the computation time through C++ high-performance accelerated massive parallel code. More recently, Diwan and Tembhurne 24 designed an adaptive generalized mapping method to parallelize polyadic-nonserial dynamic-programming problems that utilize GPUs, for efficient mapping of subproblems onto processing threads in each phase. On shared-memory architectures, Tan et al 25 proposed a parallel algorithm based on a pipeline to fill the dynamic-programming table by decomposing the calculation operators.…”

mentioning

confidence: 99%

“…{512,1024, 2048, 4096, 8192,12,288,16,384,20,480,24,576,28,672, 32,768, 36,864, 40,960, 45,056, 49,152, 53,248, 57,344, 61,440, 65,536}, and p is the number of processors, with values in the set ‡ https://www.u-picardie.fr/recherche/presentation/plateformes/plateforme-matrics-382844.kjsp {1, 8, 16, 32}. We compare our dynamic-programming solution with Yao's sequential algorithm and our previous solution (seeSection 4.3).…”

mentioning

confidence: 99%

A fast sequential algorithm for the matrix chain ordering problem

Zeutouo

Tchendji

Myoupo

2021

Concurrency and Computation

View full text Add to dashboard Cite

This article presents a fast sequential algorithm for the matrix chain ordering problem.Our solution is based on Yao's sequential algorithm that solves this problem in (n 2 ) time by reducing the total number of distinct subproblems to be performed. We solve them fastly by avoiding some unnecessary computations. Our strategy consists in organizing the evaluation of the subproblems according to their dependencies instead of their precedence order as in the previous solutions. In many cases, our solution runs in (n) time. An experimental study is conducted to benchmark the performance of our algorithm by measuring the average of the results obtained on five random data sets. This shows that our algorithm is ×18.93 faster than Yao's sequential algorithm and ×5.07 faster than the previous best CGM-based parallel solutions on 32 processors.

show abstract