2019
DOI: 10.1145/3309759
|View full text |Cite
|
Sign up to set email alerts
|

High-performance Implementation of Elliptic Curve Cryptography Using Vector Instructions

Abstract: Elliptic curve cryptosystems are considered an efficient alternative to conventional systems such as DSA and RSA. Recently, Montgomery and Edwards elliptic curves have been used to implement cryptosystems. In particular, the elliptic curves Curve25519 and Curve448 were used for instantiating Diffie-Hellman protocols named X25519 and X448. Mapping these curves to twisted Edwards curves allowed deriving two new signature instances, called Ed25519 and Ed448, of the Edwards Digital Signature Algorithm. In this wor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
35
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 28 publications
(35 citation statements)
references
References 28 publications
0
35
0
Order By: Relevance
“…squaring) multiplies two pairs of 25 or 26-bit limbs in parallel, whereby two limbs belonging to one operand are stored in a 128-bit lane of an AVX2 register. In a recent follow-up work, Faz-Hernández et al [8] presented fast 2-way and 4-way implementations of the field-arithmetic and point operations using both the Montgomery model model and the Edwards model of Curve25519. There are various other studies exploring the optimization of ECC for different vector instruction sets, such as Intel SSE2, Intel AVX-512, and ARM NEON, see e.g.…”
Section: Overview Of Related Work and Motivation For Our Workmentioning
confidence: 99%
See 2 more Smart Citations
“…squaring) multiplies two pairs of 25 or 26-bit limbs in parallel, whereby two limbs belonging to one operand are stored in a 128-bit lane of an AVX2 register. In a recent follow-up work, Faz-Hernández et al [8] presented fast 2-way and 4-way implementations of the field-arithmetic and point operations using both the Montgomery model model and the Edwards model of Curve25519. There are various other studies exploring the optimization of ECC for different vector instruction sets, such as Intel SSE2, Intel AVX-512, and ARM NEON, see e.g.…”
Section: Overview Of Related Work and Motivation For Our Workmentioning
confidence: 99%
“…While such a canonical radix-2 n representation of integers has the advantage that the total number of words k = m/n is minimal for the target platform, it entails a lot of carry propagation and, as a consequence, sub-optimal performance on modern 64-bit processors [1,7]. Fortunately, it is possible to avoid most of the carry propagations by using a reduced-radix representation (also referred to as redundant representation [8]), which means the number of bits per limb n is slightly less than the bitlength n of the processor's registers, e.g. n = 51 when implementing Curve25519 for a 64-bit processor.…”
Section: Preliminariesmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, 2-way and 4-way parallel implementations of the point addition and point doubling were presented in e.g. [3,5,7] and [7], respectively; these execute either two or four field-arithmetic operations in par-allel. Finally, there exist also implementations that combine parallelism at the field-arithmetic and point-arithmetic layer, which we characterize as (n×m)-way parallel implementations: they perform n field operations in parallel, whereby each field operation is executed in an m-way parallel fashion and uses m elements of a vector.…”
Section: Introductionmentioning
confidence: 99%
“…Finally, there exist also implementations that combine parallelism at the field-arithmetic and point-arithmetic layer, which we characterize as (n×m)-way parallel implementations: they perform n field operations in parallel, whereby each field operation is executed in an m-way parallel fashion and uses m elements of a vector. For example, Faz-Hernández et al describe in [7] a (2 × 2)-way parallel AVX2 implementation of variable-base scalar multiplication on Curve25519 that executes in 121,000 Haswell cycles or 99,400 Skylake cycles. More recently, Hisil et al [12] presented an AVX512 implementation of Curve25519 that is (4 × 2)-way parallelized (i.e.…”
Section: Introductionmentioning
confidence: 99%