2010
DOI: 10.1007/978-3-642-14390-8_50
|View full text |Cite
|
Sign up to set email alerts
|

Montgomery Multiplication on the Cell

Abstract: Abstract.A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed. The technique consists of splitting a number into four consecutive parts. These parts are placed one by one in each of the four element positions of a vector, representing columns in a 4-SIMD organization. This representation enables arithmetic to be performed in a 4-SIMD fashion. An implementation of the Montgomery multiplication using this technique is up t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2014
2014
2017
2017

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 10 publications
(15 citation statements)
references
References 10 publications
0
15
0
Order By: Relevance
“…In Figure 1, we designed multi-precision multiplication for SIMD architecture. Taking the 32-bit word with 256-bit multiplication as an example, our method works as follows 5 . Firstly, we re-organized operands by conducting transpose operation, which can efficiently shuffle inner vector by 32-bit wise.…”
Section: Cascade Operand Scanning Multiplication For Simdmentioning
confidence: 99%
See 2 more Smart Citations
“…In Figure 1, we designed multi-precision multiplication for SIMD architecture. Taking the 32-bit word with 256-bit multiplication as an example, our method works as follows 5 . Firstly, we re-organized operands by conducting transpose operation, which can efficiently shuffle inner vector by 32-bit wise.…”
Section: Cascade Operand Scanning Multiplication For Simdmentioning
confidence: 99%
“…Firstly, we re-organized operands by conducting transpose operation, which can efficiently shuffle inner vector by 32-bit wise. Instead of a normal order ((B[0], B [1]), (B [2], B [3]), (B [4], B [5]), (B [6], B [7])), we actually classify the operand as groups ((B[0], B [4]), (B [2], B [6]), (B [1], B [5]), (B [3], B [7])) for computing multiplication where each operand ranges from 0 to 2 32 − 1(0xffff ffff in hexadecimal form). Secondly, multiplication [7])) where the results are located from 0 to 2 64 −2 33 +1(0xffff fffe 0000 0001).…”
Section: Cascade Operand Scanning Multiplication For Simdmentioning
confidence: 99%
See 1 more Smart Citation
“…A parallel software approach describing systolic (a specific arrangement of processing units used in parallel computations) Montgomery multiplication is described in [10,23]. An approach using the vector instructions on the Cell microprocessor is considered in [8]. Exploiting much larger parallelism using the single instruction multiple threads paradigm, is realized by using a residue number system [14,29] as described in [4].…”
Section: Related Workmentioning
confidence: 99%
“…The research community has studied ways to reduce the latency of Montgomery multiplication by parallelizing this computation. These approaches vary from using the SIMD paradigm [8,10,18,23] to the single instruction, multiple threads paradigm using a residue number system [14,29] as described in [4,19] (see Sect. 2.3 for a more detailed overview).…”
Section: Introductionmentioning
confidence: 99%