Real-time implementation of many digital signal processing (DSP) algorithms and multimedia applications is performance limited by the available speed, energy efficiency, and area requirement of multiplication. This is exacerbated in handheld multimedia devices due to the small size and limited battery lifetimes. In our previous work, we introduce a novel canonical signed digit (CSD) iterative multiplier structure in which the conversion from 2's complement to CSD representation is implicitly implemented in real-time. In this work, we further improve the iterative multiplier performance by introducing explicit radix-8 hardware support in which the multiplier is shifted by one octal digit in each iteration as opposed to only one or two bits. Thus, this new structure further reduces the power consumption while simultaneously increasing the computational bandwidth significantly with only a small sacrifice in area consumption. This new design also uses a bypass technique to further reduce the need for devices such as carry save adder (CSA) arrays and adder trees for partial product reduction operations. Therefore, the new structure introduced here greatly improves the multiplier throughput and energy efficiency. Moreover, the number of iterations required to complete a fixed length multiply is data dependent as a result of a novel variable shifting technique; hence there is no energy and time overhead expended for unnecessary iterations as observed in multipliers where the number of iterations is fixed. Our results show that this new iterative structure delivers significant performance improvements with respect to speed, area, and power consumption relative to previous iterative multiplier designs.