An Efficient Implementation of the NewHope Key Exchange on FPGAs

Xing, Yufei; Li, Shuguo

doi:10.1109/tcsi.2019.2956651

Cited by 37 publications

(11 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With more butterfly cores, the number of reduction units, located behind multiplier in butterfly units should also be increased accordingly. Besides, RAM structure should be modified elaborately to ensure enough data bandwidth [BUC19] [XL20]. Similarly, bandwidth of Keccak core needs to be tuned accordingly, which is relatively easy as data input/output and round function are conducted separately.…”

Section: Discussion About Performance and Resource Utilizationmentioning

confidence: 99%

A Compact Hardware Implementation of CCA-Secure Key Exchange Mechanism CRYSTALS-KYBER on FPGA

Xing

2021

TCHES

Self Cite

121

View full text Add to dashboard Cite

Post-quantum cryptosystems should be prepared before the advent of powerful quantum computers to ensure information secure in our daily life. In 2016 a post-quantum standardization contest was launched by National Institute of Standards and Technology (NIST), and there have been lots of works concentrating on evaluation of these candidate protocols, mainly in pure software or through hardware-software co-design methodology on different platforms. As the contest progresses to third round in July 2020 with only 7 finalists and 8 alternate candidates remained, more dedicated and specific hardware designs should be considered to illustrate the intrinsic property of a certain protocol and achieve better performance. To this end, we present a standalone hardware design of CRYSTALS-KYBER, amodule learning-with-errors (MLWE) based key exchange mechanism (KEM) protocol within the 7 finalists on FPGA platform. Through elaborate scheduling of sampling and number theoretic transform (NTT) related calculations, decent performance is achieved with limited hardware resources. The way that Encode/Decode and the tweaked Fujisaki-Okamoto transform are implemented is demonstrated in detail. Analysis about minimizing memory footprint is also given out. In summary, we realize the adaptive chosen ciphertext attack (CCA) secure Kyber with all selectable module dimension k on the smallest Xilinx Artix-7 device. Our design computes key-generation, encapsulation (encryption) and decapsulation (decryption and reencryption) phase in 3768/5079/6668 cycles when k = 2, 6316/7925/10049 cycles when k = 3, and 9380/11321/13908 cycles when k = 4, consuming 7412/6785 LUTs, 4644/3981 FFs, 2126/1899 slices, 2/2 DSPs and 3/3 BRAMs in server/client with 6.2/6.0 ns critical path delay, outperforming corresponding high level synthesis (HLS) based designs or hardware-software co-designs to a large extent.

show abstract

Section: Discussion About Performance and Resource Utilizationmentioning

confidence: 99%

A Compact Hardware Implementation of CCA-Secure Key Exchange Mechanism CRYSTALS-KYBER on FPGA

Xing

2021

TCHES

Self Cite

121

View full text Add to dashboard Cite

show abstract

“…The random number is generated through a 128-bit Trivium PRNG [33] due to its reasonable throughput and small hardware resource consumption. Previously, Trivium was also used to generate random samples in cryptoprocessor [38], signifying that this is a reasonable choice of implementing PRNG. The Gimli hash generates 256-bit output, while Gimli AE generates 256-bit encryption output, which corresponds to the encryption input in this protocol.…”

Section: Application Of Gimli To Rfid Authenticationmentioning

confidence: 99%

A Flexible Gimli Hardware Implementation in FPGA and Its Application to RFID Authentication Protocols

2021

View full text Add to dashboard Cite

Radio Frequency Identification (RFID) systems have bestowed numerous conveniences in a multitude of applications, but the underlying wireless communications architecture makes it vulnerable to several security threats. To mitigate these issues, various authentication protocols have been proposed. The literature accommodates comprehensive proposals and analysis of authentication protocols, but not many of them provide hardware implementations. In addition, there is diverse demand for hardware area and throughput (TP) requirements from RFID system components (tags, readers, database servers), which demand a flexible implementation strategy. This paper proposes a flexible implementation strategy for the lightweight authenticated encryption (AE) and hash function called Gimli, and applies it to a state-of-theart authentication protocol. This allows the authentication protocol to be implemented efficiently, wherein the area and TP can be adjusted flexibly according to the RFID system requirements. This implementation strategy is generic; it can be used to implement any other AE and hash functions. This strategy can also be applied to other authentication protocols that heavily use AE and hash functions. To provide a detailed analysis, the hardware are optimization techniques in each component of the RFID system for a state-ofthe-art authentication protocol are analyzed. When implemented with the most area-optimized versions, we achieve TP of 740 Mbps and 420 Mbps for Gimli hash and Gimli AE, respectively, and for throughputoriented implementation, the results are 3.08 Gbps and 1.43 Gbps, respectively. This shows that the proposed implementation strategies allow us to implement authentication protocols in a flexible manner to meet the differing requirements in TP and area for RFID applications.

show abstract

“…Modular Reduction [9] propose K-RED and K-RED-2x functions to calculate the modular reduction, but the result is incomplete. For a specific prime, such as p = 3 × 2 12 + 1 used in NewHop [10], p = (2 32 − 1) × 2 32 + 1 used in [4] [11] [12], they further design the modular reduction unit to achieve constant time calculation. In this paper we specificaly consider p = (2 32 − 1) × 2 32 + 1, but this method is also useful for other prime numbers that have the form k • 2 m ± l.…”

Section: A Number Theoretic Transformmentioning

confidence: 99%

Area-Efficient Modular Reduction Structure and Memory Access Scheme for NTT

Guo

2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

Self Cite

View full text Add to dashboard Cite

Number theoretic transform based multiplication is commonly used in Post-quantum cryptography, which is the most resource-consuming operation. In this paper, we propose an area-efficient modular reduction structure for generalized Mersenne primes with interval prediction, and a novel memory access scheme which fetches two data at the same side of a butterfly unit simultaneously. By the interval prediction structure, some adders are eliminated in a modular multiplication. When implement it in 3-stage pipeline mode and synthesize it with TSMC 90nm process, this structure achieves approximate 14.9% less area compared with other designs. The proposed memory access scheme is an in-place scheme. It is more regular than other designs and the two pieces of memory share the same address. Based on this characteristic, we construct an address generator which consumes 40% less area.

show abstract

An Efficient Implementation of the NewHope Key Exchange on FPGAs

Cited by 37 publications

References 6 publications

A Compact Hardware Implementation of CCA-Secure Key Exchange Mechanism CRYSTALS-KYBER on FPGA

A Compact Hardware Implementation of CCA-Secure Key Exchange Mechanism CRYSTALS-KYBER on FPGA

A Flexible Gimli Hardware Implementation in FPGA and Its Application to RFID Authentication Protocols

Area-Efficient Modular Reduction Structure and Memory Access Scheme for NTT

Contact Info

Product

Resources

About