Compact domain-specific co-processor for accelerating module lattice-based KEM

Mera, Jose Maria Bermudo; Turan, Furkan; Karmakar, Angshuman; Roy, Sujoy Sinha; Verbauwhede, Ingrid

doi:10.1109/dac18072.2020.9218727

Cited by 11 publications

(6 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We consume 2× more area compared to [21] and deliver 3.5× better performance. Our unified cryptoprocessor outperforms [8], [9] and shows a similar performance compared to the architectures in [6], [7].…”

Section: Comparisons With Dilithium-only Implementationsmentioning

confidence: 97%

“…Our implementation is 2× slower but consumes 1.5× less area and provides the flexibility to do the operations in parallel or sequentially. Comparisons with Saber-only implementations: There are several works in the literature implementing Saber in hardware, e.g., [5], [6], [7], [8], [9], [24] on FPGA and [10], [11], [21], [23] on ASIC platforms. Their area and performance results along with our work are presented in Table 6.…”

Section: Comparisons With Dilithium-only Implementationsmentioning

confidence: 99%

“…Various works exist in the literature ( [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]) that present optimized implementations of either a PKE/KEM or a signature scheme. While such works show how to implement a given PQC algorithm optimally, they do not take real-world applications (i.e., both PKE/KEM and signature) into consideration.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Unified Cryptoprocessor for Lattice-Based Signature and Key-Exchange

Aikata

Mert

Jacquemin

et al. 2023

IEEE Trans. Comput.

View full text Add to dashboard Cite

We propose design methodologies for building a compact, unified and programmable cryptoprocessor architecture that computes post-quantum key agreement and digital signature. Synergies in the two types of cryptographic primitives are used to make the cryptoprocessor compact. As a case study, the cryptoprocessor architecture has been optimized targeting the signature scheme 'CRYSTALS-Dilithium' and the key encapsulation mechanism (KEM) 'Saber', both finalists in the NIST's post-quantum cryptography standardization project. The programmable cryptoprocessor executes key generations, encapsulations, decapsulations, signature generations, and signature verifications for all the security levels of Dilithium and Saber. On a Xilinx Ultrascale+ FPGA, the proposed cryptoprocessor consumes 18,406 LUTs, 9,323 FFs, 4 DSPs, and 24 BRAMs. It achieves 200 MHz clock frequency and finishes CCA-secure key-generation/encapsulation/decapsulation operations for LightSaber in 29.6/40.4/ 58.3µs; for Saber in 54.9/69.7/94.9µs; and for FireSaber in 87.6/108.0/139.4µs, respectively. It finishes key-generation/sign/verify operations for Dilithium-2 in 70.9/151.6/75.2µs; for Dilithium-3 in 114.7/237/127.6µs; and for Dilithium-5 in 194.2/342.1/228.9µs, respectively, for the best-case scenario. On UMC 65nm library for ASIC the latency is improved by a factor of two due to a 2× increase in clock frequency.

show abstract

Section: Comparisons With Dilithium-only Implementationsmentioning

confidence: 97%

Section: Comparisons With Dilithium-only Implementationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Unified Cryptoprocessor for Lattice-Based Signature and Key-Exchange

Aikata

Mert

Jacquemin

et al. 2023

IEEE Trans. Comput.

View full text Add to dashboard Cite

show abstract

“…A comparison of the time and power consumption of the back-end design completed using the TSMC 65 nm process with other post-quantum cryptographic algorithm hardware implementations is shown in Table 4. Study [10], study [11], and the current design all involve software/hardware implementations of the Saber algorithm. With the same flow of the algorithm, the comparison focuses on the power consumption and area of the algorithm implementation.…”

Section: Comparison With Related Literaturementioning

confidence: 99%

“…To further optimize the Saber algorithm, Sujoy et al [9] used vector processing instructions to process the algorithm operations in parallel, resulting in a nearly 1.5-fold increase in throughput, while increasing the latency of individual operations by a factor of about 3. In terms of hardware-software co-design, Mera et al [10] and Dang et al [11] used a hardware-software co-design strategy to allocate hardware-software resources for cryptographic algorithms through software algorithms, which can achieve high-speed and flexible cryptographic algorithms. Although hardware-software co-design has obvious advantages in terms of flexibility in algorithm implementation, there are still shortcomings in terms of latency and throughput, so hardware implementation of algorithms has become a research trend.…”

Section: Introductionmentioning

confidence: 99%

Hardware Design and Implementation of a Lightweight Saber Algorithm Based on DRC Method

Zheng

Zhang

et al. 2023

Electronics

View full text Add to dashboard Cite

With the development of quantum computers, the security of classical cryptosystems is seriously threatened, and the Saber algorithm has become one of the potential candidates for post-quantum cryptosystems (PQCs). To address the problems of long delay and the high power consumption of Saber algorithm hardware implementation, a lightweight Saber algorithm hardware design scheme based on the joint optimization of data readout and clock (DRC) was proposed. Firstly, an analysis was carried out on the hardware architecture, timing overhead and power consumption distribution of the Saber algorithm, and the key circuits that limit the performance of the algorithm were identified; secondly, a dual-port SRAM parallel reading method was adopted to improve the data reading efficiency and reduce the timing overhead of double data reading in the multiplier module. Then, a clock gating technology was used to reduce the dynamic flipping probability of internal registers and reduce the hardware power consumption of the Saber algorithm; finally, data reading and clock gating were jointly optimized to design a high-speed and low-power Saber algorithm hardware IP core. Lightweight IP cores were integrated into RISC-V SoC systems via APB bus in a TSMC 65 nm process to complete the digital back-end design. The experimental results show an IP core area of 0.99 mm2 and power consumption of 8.49 mW, which is 33% lower than that reported in the related literature. Under 72 MHz & 1 V operating conditions, the number of clock cycles for the Saber algorithm’s key generation, encryption and decryption are 3315, 9204 and 1420, respectively.

show abstract

Lightweight ASIP Design for Lattice-Based Post-quantum Cryptography Algorithms

Akçay,

Yalçın

2024

Arab J Sci Eng

View full text Add to dashboard Cite

Lattice-based cryptography (LBC) algorithms are considered suitable candidates for post-quantum cryptography (PQC), as they dominate the standardization process put forward by the National Institute of Standards and Technology (NIST). Indeed, three of the four key encapsulation mechanism (KEM) algorithms in the third round of the process are based on computationally hard lattice problems. On the other hand, there is an urgent need for processor designs that can run PQC algorithms efficiently, especially for embedded systems. This study presents an application-specific instruction set processor (ASIP) design for the Kyber, Saber, and NewHope algorithms based on transport triggered architecture (TTA). Custom hardware accelerators are added to the baseline processor architecture for computation-intensive steps without applying any software optimization to the reference code. We compared FPGA and ASIC implementations of our design with the prominent RISC-V cores and instruction set extension studies in the literature. According to the results, the proposed design offers greater efficiency, better performance, and lower resource utilization than its competitors in most cases.

show abstract

Compact domain-specific co-processor for accelerating module lattice-based KEM

Cited by 11 publications

References 17 publications

A Unified Cryptoprocessor for Lattice-Based Signature and Key-Exchange

A Unified Cryptoprocessor for Lattice-Based Signature and Key-Exchange

Hardware Design and Implementation of a Lightweight Saber Algorithm Based on DRC Method

Lightweight ASIP Design for Lattice-Based Post-quantum Cryptography Algorithms

Contact Info

Product

Resources

About