Hardware architecture optimized for implementing the elliptic-curve Diffie–Hellman ephemeral (ECDHE) on 256-bit Montgomery elliptic curves presents unique challenges, particularly for resource-constrained IoT and mobile devices. This work aims to provide an efficient hardware implementation of ECDHE on Curve25519, including a dedicated finite state machine (FSM) designed to handle point multiplication and ECDHE operations, utilizing constant-time algorithms and a unified memory block for resource management. Additionally, we introduce an optimized modular computation unit that covers modular addition, subtraction, multiplication, and inversion. Our proposed hardware architecture enhances the efficiency of ECDHE operations while maintaining low resource utilization, considerably reduced latency, and low power consumption. Synthesized on the Xilinx Artix-7 platform, our design boasts 64,000 Slices and a clock speed of 102 MHz, and it computes an ECDHE scalar multiplication operation in 1.1 ms, consuming 117 mW. The proposed hardware design can be applied to various platforms, including mobile devices and IoT systems.