In this paper, we present a high-speed pairing coprocessor using Residue Number System (RNS) which is intrinsically suitable for parallel computation. This work improves the design of Cheung et al. [11] using a carefully selected RNS base and an optimized pipeline design of the modular multiplier. As a result, the cycle count for a modular reduction has been halved. When combining with the lazy reduction, Karatsuba-like formulas and optimal pipeline scheduling, a 128-bit optimal ate pairing computation can be completed in less than 100,000 cycles. We prototype the design on a Xilinx Virtex-6 FPGA using 5237 slices and 64 DSPs; a 128-bit pairing is computed in 0.358 ms running at 230MHz. To the best of our knowledge, this implementation outperforms all reported hardware and software designs.
Abstract-The parameter selection of Residue Number Systems (RNS) has a great impact on its computational efficiency. This paper shows that a base extension, the most costly operation in RNS Montgomery multiplication, can be more efficient when the intervals between the RNS moduli are small. We propose a systematic RNS parameter selection procedure and two methods to select RNS moduli that lead to a reduced complexity. Our experimental results confirm the advantages of the selected moduli.
Modular multiplication is the core operation in public-key cryptographic algorithms such as RSA and the Diffie-Hellman algorithm. The efficiency of the modular multiplier plays a crucial role in the performance of these cryptographic methods. In this paper, improvements to FFT-based Montgomery Modular Multiplication (FFTM 3 ) using carry-save arithmetic and pre-computation techniques are presented. Moreover, pseudo-Fermat number transform is used to enrich the supported operand sizes for the FFTM 3 . The asymptotic complexity of our method is O(l log l log log l), which is the same as the Schönhage-Strassen multiplication Algorithm (SSA). A systematic procedure to select suitable parameter set for the FFTM 3 is provided. Prototypes of the improved FFTM 3 multiplier with appropriate parameter sets are implemented on Xilinx Virtex-6 FPGA. Our method can perform 3100-bit and 4124-bit modular multiplications in 6.74 µs and 7.78 µs, respectively. It offers better computation latency and area-latency product compared to the state-of-the-art methods for operand size of 3072-bit and above.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.