A High Throughput Efficient Approach for Decoding LDPC Codes onto GPU Devices

Gal, Bertrand Le; Jégo, Christophe; Crenne, Jérémie

doi:10.1109/les.2014.2311317

Cited by 36 publications

(6 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To create a socket, four parameters are required: its datatype (given as a template parameter), its associated task, its name, and its size. Finally, a "codelet" function need to be set (lines [14][15][16][17][18][19][20][21][22][23][24][25][26]. This codelet will be called when the task will be triggered.…”

Section: Elementary Componentsmentioning

confidence: 99%

“…Many SDR elementary blocks have been optimized for Intel® and ARM® CPUs. High throughput results have been achieved on GPUs; 19‐23 latency results are is still too high however to meet real time constraints and to compete with CPU implementations 22,24‐33 . This is mainly due to data transfers between the host (CPUs) and the device (GPUs), and to the nature of GPU designs, which are not optimized for latency efficiency.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A DSEL for high throughput and low latency software‐defined radio on multicore CPUs

Cassagne¹,

Tajan

Aumage

et al. 2023

Concurrency and Computation

View full text Add to dashboard Cite

SummaryThis article presents a new Domain Specific Embedded Language (DSEL) dedicated to Software‐Defined Radio (SDR). From a set of carefully designed components, it enables to build efficient software digital communication systems, able to take advantage of the parallelism of modern processor architectures, in a straightforward and safe manner for the programmer. In particular, proposed DSEL enables the combination of pipelining and sequence duplication techniques to extract both temporal and spatial parallelism from digital communication systems. We leverage the DSEL capabilities on a real use case: a fully digital transceiver for the widely used DVB‐S2 standard designed entirely in software. Through evaluation, we show how proposed software DVB‐S2 transceiver is able to get the most from modern, high‐end multicore CPU targets.

show abstract

Section: Elementary Componentsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A DSEL for high throughput and low latency software‐defined radio on multicore CPUs

Cassagne¹,

Tajan

Aumage

et al. 2023

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…GPU-based high-throughput LDPC decoders have been widely studied in the past years [21][22][23][24][25][26]. In [21], a high-throughput decoder based on layered scheduling was proposed.…”

Section: Introductionmentioning

confidence: 99%

“…GPU-based high-throughput LDPC decoders have been widely studied in the past years [21][22][23][24][25][26]. In [21], a high-throughput decoder based on layered scheduling was proposed. Some GPU-based optimizations were presented in [22] to obtain a high throughput, which reached a 1.27 Gbps peak throughput on a single GPU.…”

Section: Introductionmentioning

confidence: 99%

Multi-Gbps LDPC Decoder on GPU Devices

Dai

Yin

et al. 2022

Electronics

View full text Add to dashboard Cite

To meet the high throughput requirement of communication systems, the design of high-throughput low-density parity-check (LDPC) decoders has attracted significant attention. This paper proposes a high-throughput GPU-based LDPC decoder, aiming at the large-scale data process scenario, which optimizes the decoder from the perspectives of the decoding parallelism and data scheduling strategy, respectively. For decoding parallelism, the intra-codeword parallelism is fully exploited by combining the characteristics of the flooding-based decoding algorithm and GPU programming model, and the inter-codeword parallelism is improved using the single-instruction multiple-data (SIMD) instructions. For the data scheduling strategy, the utilization of off-chip memory is optimized to satisfy the demands of large-scale data processing. The experimental results demonstrate that the decoder achieves 10 Gbps throughput by incorporating the early termination mechanism on general-purpose GPU (GPGPU) devices and can also achieve a high-throughput and high-power-efficiency performance on low-power embedded GPU (EGPU) devices. Compared with the state-of-the-art work, the proposed decoder had a × 1.787 normalized throughput speedup at the same error correcting performance.

show abstract

“…The irregular data access patterns featured in turbo and LDPC decoders make efficient use of Single-Instruction Multiple-Data (SIMD) extensions present on today's processors difficult. To overcome the difficulty of efficiently accessing memory while decoding one frame and still achieve a good throughput, software decoders resorting to inter-frame parallelism (decoding multiple independent frames at the same time) are often proposed [11]- [13]. Inter-frame parallelism comes at the cost of higher latency, as many frames have to be buffered before decoding can be started.…”

Section: Introductionmentioning

confidence: 99%