Three years into the global deployment of 3G cellular services, the need to provide a compelling user experience has made HSPA an indispensable catalyst for a substantial subscriber transition from 2G to 3G [1]. Data rates reaching the full potential of 3GPP R6 from cost-effective mobile terminals with competitive power consumption are more crucial than ever for the commercial success of 3G and future of cellular data communications. The computational intensity of some of the critical functional blocks such as the demodulator and turbo decoder has made today's HSDPA receivers power hungry and costly even at only 3.6Mb/s peak rate. This work explores the improvement potential for the turbo decoder from a design perspective, in a similar thrust to our recent work on HSDPA receiver [2].Turbo decoding requires recursive computations that virtually preclude pipelined realizations, which makes high throughput challenging without resorting to parallelism at the expense of power and die area. In HSDPA this is compounded by a large block size, which, in addition to memory burden, makes the interleaver realization more complicated. Of the dedicated ASIC solutions published to date [3][4][5] only one reached the full HSDPA speed of 10.8Mb/s, at close to 1W power and 14mm 2 die area in a 0.18µm process [3]. To reduce both power and die size significantly, the current 0.13µm design has been optimized at architectural, algorithmic, and logic synthesis levels to overcome speed bottlenecks while being frugal in the use of silicon area and power.The turbo decoder ASIC, shown in Fig. 13.5.1, consists primarily of an efficient interleaver and a single sliding-window decoder based on a scaled max-log-MAP algorithm, which processes windows of 40 trellis steps sequentially in forward, backward, and dummy backward iterations in parallel with 3 state-metric units. A total of 15kB of memory has been used, and a unit for early termination of the turbo decoder is also included.Of the algorithms amenable to VLSI implementation the max-log-MAP is the most hardware efficient but requires 0.4dB higher SNR than the log-MAP algorithm, where the lookup table employed for better approximation to the theoretical MAP algorithm and associated adders lengthen the critical path of computation. In this work the max-log-MAP implementation is improved by scaling the extrinsic information with a constant fraction, so that errors introduced by the max-log approximation are attenuated as they propagate through the system during iterations [6]. The implementation complexity is close to that of the max-log-MAP, whereas the SNR loss is within 0.1dB of the log-MAP algorithm. Figure 13.5.2 shows simulated SNR performances of the 3 implementations.Within a given decoding algorithm, throughput is determined by the computational efficiency of the state-metric update, which is largely determined by the add-compare-select (ACS) array and the normalization circuitry. Instead of computing the branch, forward, and backward state metrics sequentially and adding the 3 toge...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.