The recent paradigm shift towards the transmission of large numbers of mutually interfering information streams, as in the case of aggressive spatial multiplexing, combined with requirements towards very low processing latency despite the frequency plateauing of traditional processors, initiates a need to revisit the fundamental maximum-likelihood (ML) and, consequently, the sphere-decoding (SD) detection problem. This work presents the design and VLSI architecture of MultiSphere; the first method to massively parallelize the tree search of large sphere decoders in a nearly-concurrent manner, without compromising their maximum-likelihood performance, and by keeping the overall processing complexity comparable to that of highly-optimized sequential sphere decoders. For a 10 ⇥ 10 MIMO spatially multiplexed system with 16-QAM modulation and 32 processing elements, our MultiSphere architecture can reduce latency by 29⇥ against well-known sequential SDs, approaching the processing latency of linear detection methods, without compromising ML optimality. In MIMO multicarrier systems targeting exact ML decoding, MultiSphere achieves processing latency and hardware efficiency that are orders of magnitude improved compared to approaches employing one SD per subcarrier. In addition, for 16⇥16 both "hard"-and "soft"-output MIMO systems, approximate MultiSphere versions are shown to achieve similar error rate performance with state-of-the art approximate SDs having akin parallelization properties, by using only one tenth of the processing elements, and to achieve up to approximately 9⇥ increased energy efficiency.
Super-Resolution (SR) techniques constitute a key element in image applications, which need highresolution reconstruction while in the worst case only a single low-resolution observation is available. SR techniques involve computationally demanding processes and thus researchers are currently focusing on SR performance acceleration. Aiming at improving the SR performance, the current paper builds up on the characteristics of the L-SEABI Super-Resolution (SR) method to introduce parallelization techniques for GPUs and FPGAs. The proposed techniques accelerate GPU reconstruction of Ultra-High Definition content, by achieving three (3x) times faster than the real-time performance on mid-range and previous generation devices and at least nine times (9x) faster than the real-time performance on high-end GPUs. The FPGA design leads to a scalable architecture performing four (4x) times faster than the real-time on low-end Xilinx Virtex 5 devices and sixty-nine times (69x) faster than the real-time on the Virtex 2000t. Moreover, we confirm the benefits of the proposed acceleration techniques by employing them on a different category of image-processing algorithms: on window-based Disparity functions, for which the proposed GPU technique shows an improvement over the CPU performance ranging from 14 times (14x) to 64 times (64x) while the proposed FPGA architecture provides 29x acceleration.
Image super-resolution plays an important role in a plethora of applications, including video compression and motion estimation. Detecting fractional displacements among frames facilitates the removal of temporal redundancy and improves the video quality by 2-4 dB PSNR [1] [2]. However, the increased complexity of the Fractional Motion Estimation (FME) process adds a significant computational load to the encoder and sets constraints to real-time designs. Timing analysis shows that FME accounts for almost half of the entire motion estimation period, which in turn accounts for 60−90% of the total encoding time depending on the design configuration. FME bases on an interpolation procedure to increase the resolution of any frame region by generating sub-pixels between the original pixels. Modern compression standards specify the exact filter to use in the Motion Compensation module allowing the encoder and the decoder to create and use identical reference frames. In particular, H.264/AVC specifies a 6-tap filter for computing the luma values of half-pixels and a low cost 2-tap filter for computing quarter-pixels. Even though it is common practice for encoder designers to integrate the standard 6-tap filter also in the Estimation module (before Compensation), the fact is that the interpolation technique used for detecting the displacements (not computing their residual) is an open choice following certain performance trade-offs. Aiming at speeding up the Estimation, a process of considerably higher computational demand than the Compensation, this work builds on the potential to implement a lower complexity interpolation technique instead of the H.264 6-tap filter. We integrate in the Estimation module several distinct interpolation techniques not included in the H.264 standard, while keeping the standard H.264/AVC Compensation to measure their impact on the outcome of the prediction engine. Related bibliography includes both ideas to avoid/replace the standard computations, as well as architecturestargeting the efficient implementation of the H.264 6-tap filtering procedure and the support of its increased memory requirements. To this end, we note that H.264 specifies a kernel with coefficients ⟨1,−5,20,20,−5,1⟩ to be multiplied with six consecutive pixels of the frame (either in column or row format). The resulting six products are accumulated and normalized for the generation of a single half-pixel (between 3 rd and 4 th tap). The operation must be repeated for each “horizontal” and “vertical” half-pixelby sliding the kernel on the frame, both in row and column order. Moreover, there exist as many “diagonal” half-pixels to be generated by applying the kernel on previously computed horizontal or vertical half-pixels. That is to say, depending on its position, we must process 6 or 36 frame pixels to compute a single half-pixel. To avoid the costly H.264 filter in the Estimation module, we study similar interpolation techniques using less than 6 taps, possibly exploiting gradients on the image. Section II shows three c...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.