Instruction-Set Accelerated Implementation of CRYSTALS-Kyber

Bisheh-Niasar, Mojtaba; Azarderakhsh, Reza; Kermani, Mehran Mozaffari

doi:10.1109/tcsi.2021.3106639

Cited by 65 publications

(39 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Bisheh-Niasar et al [12] deployed 2×2 BU array and improved the access pattern to reduce the computational cycle (i.e., 324 CCs). Meanwhile, Bisheh-Niasar et al [13] employed two configurable BUs in parallel, which required a larger number of CCs (i.e., 474) and performed the NTT computation at low clock frequency. However, our fully pipelined NTT design has smallest CC number and outperforms that of [11], [12], and [13] approximately 1.7×, 7×, and 19.7× acceleration, respectively.…”

Section: Implementation Results and Discussionmentioning

confidence: 99%

“…Meanwhile, Bisheh-Niasar et al [13] employed two configurable BUs in parallel, which required a larger number of CCs (i.e., 474) and performed the NTT computation at low clock frequency. However, our fully pipelined NTT design has smallest CC number and outperforms that of [11], [12], and [13] approximately 1.7×, 7×, and 19.7× acceleration, respectively. Thus, the proposed NTT architecture achieves superior performance compared to previous approaches.…”

Section: Implementation Results and Discussionmentioning

confidence: 99%

“…To compare efficacy among NTT architectures, we evaluated ATP metric of the trade-off between area requirement and latency. The fifth through eleventh columns report the [12] has smallest overall ATP value (see [13]).…”

Section: Implementation Results and Discussionmentioning

confidence: 99%

“…The proposed NTT architectures achieve highest throughput among the NTT designs of various parameter sets. Specifically compared with state-of-theart studies, our NTT designs can deliver significant throughput approximately 12× and 8× that of [10] and [12] for n = 1024, 12× that of [10] for n = 512, and 1.7×, 7×, and 19.7× that of [11], [12], and [13] for n = 256, respectively.…”

Section: Implementation Results and Discussionmentioning

confidence: 99%

“…Four BUs have been grouped in a 2×2 BU array together with several optimization strategies to speed-up the NTT computation. Bisheh-Niasar et al have then proposed a reconfigurable resource-efficient NTT architecture supporting the Kyber [13]. In which, multiple BUs have been paralleled based on the memory ping-pong strategy to adopt various Kyber configurations.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Configurable Mixed-Radix Number Theoretic Transform Architecture for Lattice-Based Cryptography

Duong-Ngoc

Lee

2022

IEEE Access

View full text Add to dashboard Cite

Lattice-based cryptography continues to dominate in the second-round finalists of the National Institute of Standards and Technology post-quantum cryptography standardization process. Computational efficiency is primarily considered to evaluate promising candidates for final round selection. In lattice-based cryptosystems, polynomial multiplication is the most expensive computation and critical to improve the performance. This paper proposes an efficient number theoretic transform (NTT) architecture to accelerate the polynomial multiplication. The proposed design applies mixed-radix multi-path delay feedback architecture and flexibly adopts various polynomial sizes. Configurable NTT design is realized to perform forward and inverse NTT computations on a unified hardware, which is then used to develop an effective polynomial multiplier. The proposed architectures were successfully accelerated on several Xilinx FPGA platforms to directly compare with state-of-the-art works. The implementation results show that the proposed NTT architectures have comparable area-time product and demonstrate 1.7∼17× performance improvement, and the proposed polynomial multipliers achieve higher performance compared with previous works. Experimental results confirmed the proposed design's applicability for high-speed large-scale data cryptoprocessors.INDEX TERMS Lattice-based cryptography; number theoretic transform; mixed-radix; multi-path delay feedback; post-quantum cryptography.

show abstract

Section: Implementation Results and Discussionmentioning

confidence: 99%

Section: Implementation Results and Discussionmentioning

confidence: 99%

Section: Implementation Results and Discussionmentioning

confidence: 99%