Modular reduction of large values is a core operation in most common public-key cryptosystems that involves intensive computations in finite fields. Within such schemes, efficiency is a critical issue for the effectiveness of practical implementation of modular reduction. Recently, Residue Number Systems have drawn attention in cryptography application as they provide a good means for extreme long integer arithmetic and their carry-free operations make parallel implementation feasible. In this paper, we present an algorithm to calculate the precise value of “ X mod p ” directly in the RNS representation of an integer. The pipe-lined, non-pipe-lined, and parallel hardware architectures are proposed and implemented on XILINX FPGAs.