Saber on ARM

Karmakar, Angshuman; Mera, Jose Maria Bermudo; Roy, Sujoy Sinha; Verbauwhede, Ingrid

doi:10.46586/tches.v2018.i3.243-266

Cited by 36 publications

(22 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since Module-LWE algorithms involve working with vectors or matrices of polynomials, it is particularly important to ensure that these polynomials fit inside the crypto-processor memory as much as possible (because reads and writes to the internal memory through software are not cheap). When multiplying the public matrix A with the secret vector s, the matrix A is generated through rejection sampling, one row at a time, following the just-in-time approach from [55]. This reduces memory footprint so that the entire computation can fit in the polynomial cache.…”

Section: Protocol Implementations and Evaluation Resultsmentioning

confidence: 99%

“…The init instruction is used to initialize a specified polynomial with all zero coefficients. The matrix A is generated one row at a time, following a just-in-time approach [55] instead of generating and storing all the rows together, to save memory, which becomes especially useful when dealing with larger matrices such as in CRYSTALS-Kyber-1024 and CRYSTALS-Dilithium-IV. We have written a Perl script to parse such plain-text programs and convert them into 32-bit binary instructions which can be decoded by the Sapphire crypto-processor.…”

Section: Chip Architecturementioning

confidence: 99%

See 1 more Smart Citation

Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols

Banerjee,

Ukyab,

Chandrakasan

2019

Preprint

View full text Add to dashboard Cite

Public key cryptography protocols, such as RSA and elliptic curve cryptography, will be rendered insecure by Shor's algorithm when large-scale quantum computers are built. Cryptographers are working on quantum-resistant algorithms, and lattice-based cryptography has emerged as a prime candidate. However, high computational complexity of these algorithms makes it challenging to implement lattice-based protocols on low-power embedded devices. To address this challenge, we present Sapphire -a lattice cryptography processor with configurable parameters. Efficient sampling, with a SHA-3-based PRNG, provides two orders of magnitude energy savings; a single-port RAM-based number theoretic transform memory architecture is proposed, which provides 124k-gate area savings; while a low-power modular arithmetic unit accelerates polynomial computations. Our test chip was fabricated in TSMC 40nm low-power CMOS process, with the Sapphire cryptographic core occupying 0.28 mm 2 area consisting of 106k logic gates and 40.25 KB SRAM. Sapphire can be programmed with custom instructions for polynomial arithmetic and sampling, and it is coupled with a low-power RISC-V micro-processor to demonstrate NIST Round 2 lattice-based CCA-secure key encapsulation and signature protocols Frodo, NewHope, qTESLA, CRYSTALS-Kyber and CRYSTALS-Dilithium, achieving up to an order of magnitude improvement in performance and energyefficiency compared to state-of-the-art hardware implementations. All key building blocks of Sapphire are constant-time and secure against timing and simple power analysis side-channel attacks. We also discuss how masking-based DPA countermeasures can be implemented on the Sapphire core without any changes to the hardware.

show abstract

Section: Protocol Implementations and Evaluation Resultsmentioning

confidence: 99%

Section: Chip Architecturementioning

confidence: 99%

Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols

Banerjee,

Ukyab,

Chandrakasan

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…In particular, Botros et al [10] and Alkim et al [11] developed ARM Cortex-M4 implementations of Kyber. Karmakar et al [12] reported results for Saber. Chung et al [13] proposed an NTT-based implementation for an NTT-unfriendly ring, targeting Cortex-M4 and AVX2.…”

Section: Previous Workmentioning

confidence: 98%

Feasibility Study of Post Quantum Cryptography in TLS 1.3

Lee¹,

Son²

2023

dcs

View full text Add to dashboard Cite

This paper focuses on optimized constant-time software implementations of three NIST PQC KEM Finalists, CRYSTALS-Kyber, NTRU, and Saber, targeting ARMv8 microprocessor cores. All optimized implementations include explicit calls to Advanced Single-Instruction Multiple-Data instructions (a.k.a. NEON instructions). Benchmarking is performed using two platforms: 1) MacBook Air, based on an Apple M1 System on Chip (SoC), including four high-performance 'Firestorm' ARMv8 cores, running with the frequency of around 3.2 GHz, and 2) Raspberry Pi 4, singleboard computer, based on the Broadcom SoC, BCM2711, with four 1.5 GHz 64-bit Cortex-A72 ARMv8 cores. In each case, only one core of the respective SoC is being used for benchmarking. The obtained results demonstrate substantial speed-ups vs. the best available implementations written in pure C. For the 'Firestorm' core of Apple M1, NEON implementations outperform pure C implementations in the case of decapsulation by factors varying in the following ranges: 1.55-1.74 for Saber, 2.96-3.04 for Kyber, and 7.24-8.49 for NTRU, depending on an algorithm's variant and security level. For encapsulation, the corresponding ranges are 1.37-1.60 for Saber, 2.33-2.45 for Kyber, and 3.05-6.68 for NTRU. These uneven speed-ups of the three lattice-based KEM finalists affect their rankings for optimized software implementations targeting ARMv8.

show abstract

“…Roy et al [12] implemented a high-speed dot matrix Saber algorithm on FPGA, with a running time of 61.4 µs at a clock frequency of 250 MHz. To meet the requirements of the Saber algorithm in resource-constrained situations, and to reduce the running time and power consumption of the algorithm hardware [13][14][15][16][17], there is an urgent need to study lightweight cryptographic algorithms. Based on this, this design constructs the algorithm model based on the official documentation [18] of the algorithm, and thus completes the full hardware implementation of Saber's algorithm.…”

Section: Introductionmentioning

confidence: 99%

Hardware Design and Implementation of a Lightweight Saber Algorithm Based on DRC Method

Zheng

Zhang

et al. 2023

Electronics

View full text Add to dashboard Cite

With the development of quantum computers, the security of classical cryptosystems is seriously threatened, and the Saber algorithm has become one of the potential candidates for post-quantum cryptosystems (PQCs). To address the problems of long delay and the high power consumption of Saber algorithm hardware implementation, a lightweight Saber algorithm hardware design scheme based on the joint optimization of data readout and clock (DRC) was proposed. Firstly, an analysis was carried out on the hardware architecture, timing overhead and power consumption distribution of the Saber algorithm, and the key circuits that limit the performance of the algorithm were identified; secondly, a dual-port SRAM parallel reading method was adopted to improve the data reading efficiency and reduce the timing overhead of double data reading in the multiplier module. Then, a clock gating technology was used to reduce the dynamic flipping probability of internal registers and reduce the hardware power consumption of the Saber algorithm; finally, data reading and clock gating were jointly optimized to design a high-speed and low-power Saber algorithm hardware IP core. Lightweight IP cores were integrated into RISC-V SoC systems via APB bus in a TSMC 65 nm process to complete the digital back-end design. The experimental results show an IP core area of 0.99 mm2 and power consumption of 8.49 mW, which is 33% lower than that reported in the related literature. Under 72 MHz & 1 V operating conditions, the number of clock cycles for the Saber algorithm’s key generation, encryption and decryption are 3315, 9204 and 1420, respectively.

show abstract

Saber on ARM

Cited by 36 publications

References 9 publications

Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols

Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols

Feasibility Study of Post Quantum Cryptography in TLS 1.3

Hardware Design and Implementation of a Lightweight Saber Algorithm Based on DRC Method

Contact Info

Product

Resources

About