Abstract:Abstract-We present a new multi-rate architecture for decoding irregular LDPC codes in IEEE 802.16e WiMax standard. The proposed architecture utilizes the value-reuse property of offset min-sum, block-serial scheduling of computations and turbo decoding message passing algorithm. The decoder has the following advantages: 55% savings in memory, reduction of routers by 50%, and increase of throughput by 2x when compared to the recent state-of-the-art decoder architectures.Index Terms-low-density parity-check (LD… Show more
“…Table II compares the implementation result of our decoder with existing 802.11n LDPC decoders from [3], [4], [6]. The solutions from [3], [4], [6] are all based on the conventional single-layer decoding architecture. As a fair comparison, the areas of those designs are all normalized to 45nm technology.…”
Section: Vlsi Implementation and Comparisonmentioning
confidence: 99%
“…we can employ Z parallel check node processors to process Z rows in parallel. With this amount of parallelism, the conventional layered decoder can typically offer 100-1000 Mbps throughput [3], [4], [5], [6].…”
Section: Introductionmentioning
confidence: 99%
“…The layered decoding algorithm has a much higher convergence speed (up to two times faster), and requires less memory compared to the standard two-phase flooding decoding algorithm. Thus, partial-parallel layered decoder architectures are often used to decode the QC-LDPC codes [3], [4], [5], [6], [7], [8], [9], [10].…”
Abstract-We propose a multi-layer parallel decoding algorithm and VLSI architecture for decoding of structured quasi-cyclic low-density parity-check codes. In the conventional layered decoding algorithm, the block-rows of the parity check matrix are processed sequentially, or layer after layer. The maximum number of rows that can be simultaneously processed by the conventional layered decoder is limited to the sub-matrix size. To remove this limitation and support layer-level parallelism, we extend the conventional layered decoding algorithm and architecture to enable simultaneously processing of multiple (K) layers of a parity check matrix, which will lead to a roughly K-fold throughput increase. As a case study, we have designed a double-layer parallel LDPC decoder for the IEEE 802.11n standard. The decoder was synthesized for a TSMC 45-nm CMOS technology. With a synthesis area of 0.81 mm 2 and a maximum clock frequency of 815 MHz, the decoder achieves a maximum throughput of 3.0 Gbps at 15 iterations.
“…Table II compares the implementation result of our decoder with existing 802.11n LDPC decoders from [3], [4], [6]. The solutions from [3], [4], [6] are all based on the conventional single-layer decoding architecture. As a fair comparison, the areas of those designs are all normalized to 45nm technology.…”
Section: Vlsi Implementation and Comparisonmentioning
confidence: 99%
“…we can employ Z parallel check node processors to process Z rows in parallel. With this amount of parallelism, the conventional layered decoder can typically offer 100-1000 Mbps throughput [3], [4], [5], [6].…”
Section: Introductionmentioning
confidence: 99%
“…The layered decoding algorithm has a much higher convergence speed (up to two times faster), and requires less memory compared to the standard two-phase flooding decoding algorithm. Thus, partial-parallel layered decoder architectures are often used to decode the QC-LDPC codes [3], [4], [5], [6], [7], [8], [9], [10].…”
Abstract-We propose a multi-layer parallel decoding algorithm and VLSI architecture for decoding of structured quasi-cyclic low-density parity-check codes. In the conventional layered decoding algorithm, the block-rows of the parity check matrix are processed sequentially, or layer after layer. The maximum number of rows that can be simultaneously processed by the conventional layered decoder is limited to the sub-matrix size. To remove this limitation and support layer-level parallelism, we extend the conventional layered decoding algorithm and architecture to enable simultaneously processing of multiple (K) layers of a parity check matrix, which will lead to a roughly K-fold throughput increase. As a case study, we have designed a double-layer parallel LDPC decoder for the IEEE 802.11n standard. The decoder was synthesized for a TSMC 45-nm CMOS technology. With a synthesis area of 0.81 mm 2 and a maximum clock frequency of 815 MHz, the decoder achieves a maximum throughput of 3.0 Gbps at 15 iterations.
“…In a semi-parallel implementation, memories are usually required to store the temporary results. In many practical systems, semi-parallel implementations are often used to achieve 100 Mbps to 1 Gbps throughput with reasonable complexity [7,21,35,42,43,54].…”
For computation-intensive digital signal processing algorithms, complexity is exceeding the processing capabilities of general-purpose digital signal processors (DSPs). In some of these applications, DSP hardware accelerators have been widely used to off-load a variety of algorithms from the main DSP host, including FFT, FIR/IIR filters, multiple-input multiple-output (MIMO) detectors, and error correction codes (Viterbi, Turbo, LDPC) decoders. Given power and cost considerations, simply implementing these computationally complex parallel algorithms with high-speed general-purpose DSP processor is not very efficient. However, not all DSP algorithms are appropriate for off-loading to a hardware accelerator. First, these algorithms should have data-parallel computations and repeated operations that are amenable to hardware implementation. Second, these algorithms should have a deterministic dataflow graph that maps to parallel datapaths. The accelerators that we consider are mostly coarse grain to better deal with streaming data transfer for achieving both high performance and low power. In this chapter, we focus on some of the basic and advanced digital signal processing algorithms for communications and cover major examples of DSP accelerators for communications.
“…In the literature, many efficient LDPC decoder VLSI architectures have been studied [6,9,12,14,18,24,27,29,35,37,39,45,47]. Turbo decoder VLSI architectures have also been extensively investigated by many researchers [5,8,20,21,25,30,33,41,44].…”
Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm 2 . The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.