Abstract.To improve the speed of modular multiplication operation on ECC processor over GF (p), the paper presented a novel hardware implementation of Montgomery algorithm. Based on analyzing basic Montgomery modular multiplication algorithm, this work applied multi-step operation to Montgomery algorithm, which can accelerate speed by reducing the number of clocks. Simulation with Modelsim indicates that a completion of modular multiplication requires only 16 clock circles. Finally, the design of hardware architectures was evaluated on Altera Stratix III families. By following the new design concept, the multiplier structure can reach to a higher performance. Compared with other modular multiplier, the computation time of improved modular multiplier is lesser, decreasing 42% and reaching to 0.2 s µ . It is estimated that the computation of a 256-bit scalar point multiplication over GF(p) would take about 0.76 ms.