2021
DOI: 10.1007/s13389-021-00256-9
|View full text |Cite
|
Sign up to set email alerts
|

Parallel modular multiplication using 512-bit advanced vector instructions

Abstract: Applications such as public-key cryptography are critically reliant on the speed of modular multiplication for their performance. This paper introduces a new block-based variant of Montgomery multiplication, the Block Product Scanning (BPS) method, which is particularly efficient using new 512-bit advanced vector instructions (AVX-512) on modern Intel processor families. Our parallel-multiplication approach also allows for squaring and sub-quadratic Karatsuba enhancements. We demonstrate $$1.9\,\times $$ … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…The information and latency of field multiplication in both versions are shown in Table 2, which indicates that our Karatsuba-based AVX-512F implementation outperforms the BPS variant in [BGH21]. We herein emphasize on the importance of using an optimal field multiplication in such parallel AVX-512 software of an isogeny-based cryptosystem.…”
Section: Field Multiplicationmentioning
confidence: 89%
See 1 more Smart Citation
“…The information and latency of field multiplication in both versions are shown in Table 2, which indicates that our Karatsuba-based AVX-512F implementation outperforms the BPS variant in [BGH21]. We herein emphasize on the importance of using an optimal field multiplication in such parallel AVX-512 software of an isogeny-based cryptosystem.…”
Section: Field Multiplicationmentioning
confidence: 89%
“…Takahashi proposed both AVX-512F and IFMA implementation of 8-way Montgomery multiplication in [Tak20], but this software works on 62-bit and 52-bit operands, respectively, and not in the case of large integers. Buhrow, Gilbert, and Haider in [BGH21] presented a Block Product Scanning (BPS) variant of Montgomery multiplication, which is based on radix-2 32 representation. An 8-way 512-bit BPS variant implemented with AVX-512F takes 189 clock cycles for each instance, which translates to 1512 clock cycles for a whole 8-way implementation.…”
Section: Field Multiplicationmentioning
confidence: 99%