A Novel CPU-GPU Cooperative Implementation of A Parallel Two-List Algorithm for the Subset-Sum Problem

Wan, Lanjun; Li, Kenli; Liu, Jing; Li, Keqin

doi:10.1145/2578948.2560688

Cited by 9 publications

(17 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are many algorithms developed to solve classic subset sum problem, some of them are: branch-and-bound, parallel two-list algorithm, genetic algorithm, etc. [4].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Parallel implementation of the modified subset sum problem in CUDA

Ristovski

Mishkovski

Gramatikov

et al. 2014

2014 22nd Telecommunications Forum Telfor (TELFOR)

View full text Add to dashboard Cite

In the recent years, computing is shifting from "central processing" on the CPU to "co-processing" on the CPU and GPU. This computing paradigm shift is due to the development of CUDA (Compute Unified Device Architecture) parallel computing architecture. CUDA is a programming model for parallel computing in Graphics Processing Units (GPUs). In this work, we have implemented parallel solution of the NP-complete modified subset sum algorithm using CUDA. With our implementation, for a certain problem size, we have obtained speedup of 20 times, compared to the CPU version.

show abstract

“…There are many algorithms developed to solve classic subset sum problem, some of them are: branch-and-bound, parallel two-list algorithm, genetic algorithm, etc. [4].…”

Section: Introductionmentioning

confidence: 99%

“…Their CUDA implementation will not show good speedup if the table does not fit within the device memory. Another approach [4] exploits the CPU-GPU cooperation in order to achieve a speedup factor of $9.2$ over the best sequential implementation. Our implementation is solely on GPU and does not use any table for dynamic programming.…”

Section: Introductionmentioning

confidence: 99%

Parallel implementation of the modified subset sum problem in CUDA

Ristovski

Mishkovski

Gramatikov

et al. 2014

2014 22nd Telecommunications Forum Telfor (TELFOR)

View full text Add to dashboard Cite

show abstract

“…The basic unit of execution in CUDA is the so called kernel. When a CUDA program invokes a kernel on the host side, these thread blocks within a grid are enumerated and distributed to multiprocessors with available execution capacity; all threads within a grid can be executed in parallel …”

Section: The Proposed Gpu Implementation and Optimizationmentioning

confidence: 99%

Research and implementation of a high performance parallel computing digital down converter on graphics processing unit

Shao

Chen

Yang

2016

Concurrency and Computation

View full text Add to dashboard Cite

Summary Digital down converter (DDC) is a time‐intensive and data‐intensive computing task and considered as the key technology in software defined radio. This paper proposes a high‐performance implementation of DDC on a graphics processing unit (GPU) using CUDA, which is composed of a numerically controlled oscillator stage, a cascaded integrator‐comb (CIC) decimation filter stage, and a finite impulse response (FIR) filter stage. The GPU implementation and optimizing of all the stages are studied in detail. Additionally, for handling a long‐duration signal, the signal data sequence is truncated into segments; the overlap‐save and overlap‐add mechanisms were applied in CIC stage and FIR stage, respectively. Finally, experiments were conducted to evaluate the performance of GPU‐based DDC with respect to a sequential version CPU implementation and an OpenMP implementation (16 threads). Experimental results demonstrate that the DDC achieves significant improvements on the GPU; the maximum speed ups in numerically controlled oscillator stage, CIC stage, and FIR stage can achieve more than 1242, 527, and 179 times, including data‐transfer, kernel execution, and other processing operations; the overall speed up of DDC can achieve more than 180. In the meantime, the speed ups of GPU implementation are far above the OpenMP implementation (about 2.5‐6.4 times).

show abstract

“…GPGPU (General-Purpose computing on Graphics Processing Unit) is a typical instance. Example implementations: two-list algorithm for the subset-sum problem [9] or protein structure similarity search engine [10] illustrate the approach-parallel algorithms execution on GPU requires adjusting to the specific architecture.…”

Section: A Parallel Algorithms Testingmentioning

confidence: 99%

Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi

Malinowski¹

2015

IJITCS

View full text Add to dashboard Cite

Abstract-Parallel algorithms are popular method of increasing system performance. Apart from showing their properties using asymptotic analysis, proof-of-concept implementation and practical experiments are often required. In order to speed up the development and provide simple and easily accessible testing environment that enables execution of reliable experiments, the paper proposes a platform with multi-core computational accelerator: Intel Xeon Phi, and modern programming language: Java. The article includes the description of integration Java with Xeon Phi, as well as detailed information about all of the software components. Finally, the set of tests proves, that proposed platform is able to prepare reliable experiments of parallel algorithms implemented in modern programming language.

show abstract

A Novel CPU-GPU Cooperative Implementation of A Parallel Two-List Algorithm for the Subset-Sum Problem

Cited by 9 publications

References 22 publications

Parallel implementation of the modified subset sum problem in CUDA

Parallel implementation of the modified subset sum problem in CUDA

Research and implementation of a high performance parallel computing digital down converter on graphics processing unit

Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi

Contact Info

Product

Resources

About