Embedded microprocessors are used in a wide variety of platforms, including Radio frequency identification (RFID) systems, sensor networks, and smartphones. Unfortunately, as practical use of microprocessors has increased, so have the security problems associated with them. Although public key cryptography (PKC) can mitigate these problems, standard implementations of PKC also impose a steep computational cost on resource-constrained devices. To reduce this cost, researchers have proposed alternative implementations that accelerate multiprecision multiplication, the most expensive operation involved in PKC. In this paper, we focus on a further optimization of this same operation, using several innovative methods: carry-once, optimized multiplication and accumulation (MAC), unbalanced comb, and optimized comb-window. These methods yield further performance improvements of 2%, 17%, 4.5%, and 9.5%, respectively, on representative modern microprocessors including ATmega128 and MSP430.