This paper proposes a high-efficient preprocessing algorithm for 16 × 16 MIMO detections. The proposed algorithm combines a sorting-relaxed QR decomposition (SRQRD) and a modified greedy LLL (MGLLL) algorithm. First, SRQRD is conducted to decompose the channel matrices. This decomposition adopts a relaxed sorting strategy together with a paralleled Givens Rotation (GR) array scheme, which can reduce the processing latency by 60% compared with conventional sorted QR decomposition (SQRD). Then, an MGLLL algorithm is conducted to improve detection performance further. The MGLLL algorithm adopts a paralleled selection criterion, and only process the most urgent iterations. Thus the processing latency and column swaps can be reduced by 50% and 75%, respectively, compared with the standard LLL algorithm. Finally, the bit-error-rate (BER) performance of this preprocessing algorithm is evaluated using two MIMO detectors. Results indicate that this preprocessor suffers a negligible performance degradation compared with the combination of the standard LLL algorithm and SQRD. Based on this preprocessing algorithm, a pipelined hardware architecture is also designed in this paper. A series of systolic coordinatedrotation-digital-computer (CORDIC) arrays are utilized, and highly-pipelined circuits are designed, helping this architecture achieve high frequency performance. This architecture is implemented using 65-nm CMOS technology, which can work at a maximum frequency of 625 MHz to process channel matrices every 16 clock cycles. The latency is 0.9 us. Comparisons indicate that this preprocessor outperforms other similar designs in terms of latency, throughput, and gate-efficiency.
Polar codes attract more and more attention of researchers in recent years, since its capacity achieving property. However, their error-correction performance under successive cancellation (SC) decoding is inferior to other modern channel codes at short or moderate blocklengths. SC-Flip (SCF) decoding algorithm shows higher performance than SC decoding by identifying possibly erroneous decisions made in initial SC decoding and flipping them in the sequential decoding attempts. However, it performs not well when there are more than one erroneous decisions in a codeword. In this paper, we propose a path metric aided bit-flipping decoding algorithm to identify and correct more errors efficiently. In this algorithm, the bit-flipping list is generated based on both log likelihood ratio (LLR) based path metric and bit-flipping metric. The path metric is used to verify the effectiveness of bit-flipping. In order to reduce the decoding latency and computational complexity, its corresponding pipeline architecture is designed. By applying these decoding algorithm and pipeline architecture, an improvement on error-correction performance can be got up to 0.25dB compared with SCF decoding at frame error rate of 10 −4 , with low average decoding latency.
Sorted QR decomposition (SQRD) has been extensively adopted for various multipleinput-multiple-output (MIMO) detectors, in which the sorting process incurs severe latency when it comes to larger-scale MIMO situations. This paper proposes a group-SQRD (GSQRD) algorithm to alleviate the latency problem of general SQRD architectures for larger-scale MIMO systems. Via predictively sorting a group of 4 columns at one stage, the GSQRD could eliminate the processing latency by 41% for decomposing 16×16 complex-valued matrices. Additionally, this percentage even rises up to 68% for decomposing 128×128 matrices. To analyse the side effects, the GSQRD is applied in various MIMO detectors in a simulation link, which exhibits a negligible performance degradation for MIMO detection. Moreover, GSQRD is a hardware-friendly algorithm because the division and square root operations in GSQRD are converted to multiplications for simplifying the hardware implementation. Based on this algorithm, two corresponding hardware architectures, which contains 2 and 4 columns respectively in a sorting group, are also implemented with 65-nm CMOS technology. These architectures can work at 513 MHz to decompose 16×16 complex-valued matrices. The processing latencies are respectively 0.32 and 0.26 s, superior to the state-of-art designs.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.