2016
DOI: 10.1145/2876503
|View full text |Cite
|
Sign up to set email alerts
|

Modular SIMD arithmetic in M athemagix

Abstract: Modular integer arithmetic occurs in many algorithms for computer algebra, cryptography, and error correcting codes. Although recent microprocessors typically offer a wide range of highly optimized arithmetic functions, modular integer operations still require dedicated implementations. In this article, we survey existing algorithms for modular integer arithmetic and present detailed vectorized counterparts. We also describe several applications, such as fast modular Fourier transforms and multiplication of in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…When we turn pseudo-reductions into reductions, we are dealing with integers that fit inside one machine-word. Therefore we have optimized our implementation by coding an SIMD version of the Barrett reduction [Barrett 1986] as explained in [Hoeven et al 2016]. Finally, to minimize cache misses we store modular matrices contiguously to each other, as ((A rem m 1 ), (A rem m 2 ), .…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…When we turn pseudo-reductions into reductions, we are dealing with integers that fit inside one machine-word. Therefore we have optimized our implementation by coding an SIMD version of the Barrett reduction [Barrett 1986] as explained in [Hoeven et al 2016]. Finally, to minimize cache misses we store modular matrices contiguously to each other, as ((A rem m 1 ), (A rem m 2 ), .…”
Section: Methodsmentioning
confidence: 99%
“…Note that already slightly larger moduli will allow one to substantially increase the possible bitsize of coefficients: with our forthcoming technique, we will be able to multiply polynomials up to degree 2 22 and of coefficient bitsize 2 20 using primes of 42-bits. Note that this is particularly interesting since FFT performances are almost not penalized when one uses primes up to 53 bits instead of primes of 32 bits as demonstrated in [Hoeven et al 2016].…”
Section: A:15mentioning
confidence: 99%
“…Whilst there has been extensive research on the optimization of modular multiplications for GPUs, 17‐20 efforts to optimize modular operations for SIMD instruction sets have been somewhat more scarce and focused on the prefix×$$ \times $$86 architecture. Examples of these efforts include optimizations for several prefix×$$ \times $$86 CPUs supporting SSE2 (Streaming SIMD Extensions 2), 17 efficient implementations of modular operations for the SSE and AVX instruction sets (in particular SSE4.2 and AVX2 for the Barett and Montgomery methods), 21 and the implementation of an efficient modular multiplication algorithm for AVX‐512 22 . To the best of our knowledge, there has been no similar work concerning the implementation of efficient modular arithmetic methods for Arm SVE.…”
Section: Related Workmentioning
confidence: 99%