This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbodecoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm 2 area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm 2 /Mbps and energy efficiency of 0.55 nJ/b/iter.