In this paper, we present a high-performance architecture for elliptic curve cryptography (ECC) over Curve448, which to the best of our knowledge, is the fastest implementation of ECC point multiplication over Curve448 to date. Firstly, we introduce a novel variant of the Karatsuba formula for asymmetric digit multiplier, suitable for typical DSP primitive with asymmetric input. It reduces the number of required DSPs compared to previous work and preserves the performance via full parallelization and pipelining. We then construct a 244-bit pipelined multiplier and interleaved fast reduction algorithm, yielding a total of 12 stages of pipelined modular multiplication with four stages of input delay. Additionally, we present an efficient Montgomery ladder scheduling with no additional register is required. The implementation on the Xilinx 7-series FPGA: Virtex-7, Kintex-7, Artix-7, and Zynq 7020 yields execution times of 0.12, 0.13, 0.24, and 0.24 ms, respectively. It increases the throughput by 242% compared to the best previous work on Zynq 7020 and by 858% compared to the best previous work on Virtex-7. Furthermore, the proposed architecture optimizes nearly 63% efficiency improvement in terms of Area×Time tradeoff. Lastly, we extend our architecture with well-known side-channel protections such as scalar blinding, basepoint randomization, and continuous randomization.INDEX TERMS elliptic-curves cryptography (ECC); Curve448; high-speed multiplier; asymmetric Karatsuba; field-programmable gate array (FPGA)