Area-Time Efficient Hardware Implementation of Modular Multiplication for Elliptic Curve Cryptography

Islam, Md. Mainul; Hossain, Md. Selim; Shahjalal, Md.; Hasan, Moh. Khalid; Jang, Yeong Min

doi:10.1109/access.2020.2988379

Cited by 48 publications

(57 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Comparing with [23], our design requires 2.0× AT1 but offers 5.6× TR/A1. As compared with [30], although the AT2 is increased by 1.3×, our design improves the TR/A2 by 3.7×. Comparing with [27], our design improves the TR/A2 by 9.9× and 7.8×, respectively, with a similar AT2 performance.…”

Section: Resultsmentioning

confidence: 76%

“…Table III compares the performance of this design and various 256-bit interleaved modular multiplication implementations based on FPGA. To speed up the modular multiplication, both [23] and [30] use the radix-2 interleaved modular multiplication algorithm which consumes 257 clock cycles. However, thanks to the proposed ultra-high radix method, this design reduces the number of iterations from 256 to 11, leading to the computation latency reduced by 66.7% and 61.4% compared with [23] and [30] respectively.…”

Section: Resultsmentioning

confidence: 99%

“…To speed up the modular multiplication, both [23] and [30] use the radix-2 interleaved modular multiplication algorithm which consumes 257 clock cycles. However, thanks to the proposed ultra-high radix method, this design reduces the number of iterations from 256 to 11, leading to the computation latency reduced by 66.7% and 61.4% compared with [23] and [30] respectively. [27] designs two radix-4 interleaved modular multipliers through the pre-computation method [24].…”

Section: Resultsmentioning

confidence: 99%

See 2 more Smart Citations

Algorithm-hardware co-design of ultra-high radix based high throughput modular multiplier

Xiao

Liu

et al. 2021

IEICE Electron. Express

View full text Add to dashboard Cite

This paper presents an algorithm-hardware co-design of ultrahigh radix modular multiplier for high throughput modular multiplication. First, to speed up the modular multiplication, we exploit an ultra-high radix interleaved modular multiplication algorithm with a novel segmented reduction method, which reduces the number of iterations and precomputations. Then, to further improve the throughput of the modular multiplication, we design a highly parallel modular multiplier architecture. Finally, we implement and verify the modular multiplier using the Xilinx Virtex-7 FPGA. Experimental results show it can perform a 256-bit modular multiplication in 0.56 µs with the throughput rate of up to 4999.7 Mbps.

show abstract

Section: Resultsmentioning

confidence: 76%

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Algorithm-hardware co-design of ultra-high radix based high throughput modular multiplier

Xiao

Liu

et al. 2021

IEICE Electron. Express

View full text Add to dashboard Cite

show abstract

“…The speed and occupied area of the processor entirely depend on it. Although a radix-2 multiplier consumes less hardware resources compared to higher radix (e.g., radix-4 and radix-8) multipliers [ 33 ], it is not compatible for high-speed multiplication because of its high latency. To reduce the latency, an efficient radix-4 interleaved modular multiplication algorithm is proposed as demonstrated in Algorithm 1.…”

Section: Proposed Hardware Architecturesmentioning

confidence: 99%

Design and Implementation of High-Performance ECC Processor with Unified Point Addition on Twisted Edwards Curve

Islam

Hossain

Hasan

et al. 2020

Sensors

Self Cite

View full text Add to dashboard Cite

With the swift evolution of wireless technologies, the demand for the Internet of Things (IoT) security is rising immensely. Elliptic curve cryptography (ECC) provides an attractive solution to fulfill this demand. In recent years, Edwards curves have gained widespread acceptance in digital signatures and ECC due to their faster group operations and higher resistance against side-channel attacks (SCAs) than that of the Weierstrass form of elliptic curves. In this paper, we propose a high-speed, low-area, simple power analysis (SPA)-resistant field-programmable gate array (FPGA) implementation of ECC processor with unified point addition on a twisted Edwards curve, namely Edwards25519. Efficient hardware architectures for modular multiplication, modular inversion, unified point addition, and elliptic curve point multiplication (ECPM) are proposed. To reduce the computational complexity of ECPM, the ECPM scheme is designed in projective coordinates instead of affine coordinates. The proposed ECC processor performs 256-bit point multiplication over a prime field in 198,715 clock cycles and takes 1.9 ms with a throughput of 134.5 kbps, occupying only 6543 slices on Xilinx Virtex-7 FPGA platform. It supports high-speed public-key generation using fewer hardware resources without compromising the security level, which is a challenging requirement for IoT security.

show abstract

“…Analysis of Table 9 shows that U 4 -based FSMs are the ones with the highest maximum operating frequency compared to other methods. The overall design quality can be estimated by the product of used resources [63] (for example, chip area occupied by a circuit) and the latency time. As it is in [63], we use the number of LUTs to compare areas required for FSM circuits based on different models (auto, one-hot, JEDI, U 2 and U 4 ).…”

Section: Benchmarkmentioning

confidence: 99%

Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs

et al. 2020

View full text Add to dashboard Cite

Contemporary digital systems include many varying sequential blocks. In the article, we discuss a case when Mealy finite state machines (FSMs) describe the behavior of sequential blocks. In many cases, the performance is the most important characteristic of an FSM circuit. In the article, we propose a method which allows increasing the operating frequency of multi-level look-up table (LUT)-based Mealy FSMs. The main idea of the proposed approach is to use together two methods of structural decomposition. They are: (1) the known method of transformation of codes of collections of outputs into FSM state codes and (2) a new method of extension of state codes. The proposed approach allows producing FPGA-based FSMs having three levels of logic combined through the system of regular interconnections. Each function for every level of logic was implemented using a single LUT. An example of the synthesis of Mealy FSM with the proposed architecture is shown. The effectiveness of the proposed method was confirmed by the results of experimental studies based on standard benchmark FSMs. The research results show that FSM circuits based on the proposed approach have a higher operating frequency than can be obtained using other investigated methods. The maximum operating frequency is improved by an average of 3.18 to 12.57 percent. These improvements are accompanied by a small growth of LUT count.

show abstract

Area-Time Efficient Hardware Implementation of Modular Multiplication for Elliptic Curve Cryptography

Cited by 48 publications

References 27 publications

Algorithm-hardware co-design of ultra-high radix based high throughput modular multiplier

Algorithm-hardware co-design of ultra-high radix based high throughput modular multiplier

Design and Implementation of High-Performance ECC Processor with Unified Point Addition on Twisted Edwards Curve

Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs

Contact Info

Product

Resources

About