VLSI Architectures for Layered Decoding for Irregular LDPC Codes of WiMax

2011 IEEE International Symposium of Circuits and Systems (ISCAS)

Wang

Cavallaro

2011

Abstract-We propose a multi-layer parallel decoding algorithm and VLSI architecture for decoding of structured quasi-cyclic low-density parity-check codes. In the conventional layered decoding algorithm, the block-rows of the parity check matrix are processed sequentially, or layer after layer. The maximum number of rows that can be simultaneously processed by the conventional layered decoder is limited to the sub-matrix size. To remove this limitation and support layer-level parallelism, we extend the conventional layered decoding algorithm and architecture to enable simultaneously processing of multiple (K) layers of a parity check matrix, which will lead to a roughly K-fold throughput increase. As a case study, we have designed a double-layer parallel LDPC decoder for the IEEE 802.11n standard. The decoder was synthesized for a TSMC 45-nm CMOS technology. With a synthesis area of 0.81 mm 2 and a maximum clock frequency of 815 MHz, the decoder achieves a maximum throughput of 3.0 Gbps at 15 iterations.

Section: Vlsi Implementation and Comparisonmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-layer parallel decoding algorithm and vlsi architecture for quasi-cyclic LDPC codes

2011 IEEE International Symposium of Circuits and Systems (ISCAS)

Wang

Cavallaro

2011

“…In a semi-parallel implementation, memories are usually required to store the temporary results. In many practical systems, semi-parallel implementations are often used to achieve 100 Mbps to 1 Gbps throughput with reasonable complexity [7,21,35,42,43,54].…”

Section: Ldpc Decoder Accelerator Architecturementioning

confidence: 99%

Application-Specific Accelerators for Communications

Handbook of Signal Processing Systems

Amiri

Brogioli³

et al. 2010

For computation-intensive digital signal processing algorithms, complexity is exceeding the processing capabilities of general-purpose digital signal processors (DSPs). In some of these applications, DSP hardware accelerators have been widely used to off-load a variety of algorithms from the main DSP host, including FFT, FIR/IIR filters, multiple-input multiple-output (MIMO) detectors, and error correction codes (Viterbi, Turbo, LDPC) decoders. Given power and cost considerations, simply implementing these computationally complex parallel algorithms with high-speed general-purpose DSP processor is not very efficient. However, not all DSP algorithms are appropriate for off-loading to a hardware accelerator. First, these algorithms should have data-parallel computations and repeated operations that are amenable to hardware implementation. Second, these algorithms should have a deterministic dataflow graph that maps to parallel datapaths. The accelerators that we consider are mostly coarse grain to better deal with streaming data transfer for achieving both high performance and low power. In this chapter, we focus on some of the basic and advanced digital signal processing algorithms for communications and cover major examples of DSP accelerators for communications.

“…In the literature, many efficient LDPC decoder VLSI architectures have been studied [6,9,12,14,18,24,27,29,35,37,39,45,47]. Turbo decoder VLSI architectures have also been extensively investigated by many researchers [5,8,20,21,25,30,33,41,44].…”

Section: Introductionmentioning

confidence: 99%

A Flexible LDPC/Turbo Decoder Architecture

Cavallaro

2010

J Sign Process Syst

Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm 2 . The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE