2013
DOI: 10.5815/ijcnis.2013.06.08
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting SIMD Instructions in Modern Microprocessors to Optimize the Performance of Stream Ciphers

Abstract: Modern microprocessors are loaded with a lot of performance optimization features. Single Instruction Multiple Data (SIMD) instruction set feature specially designed for improving the performance of multimedia applications is one among them. But most of the encryption algorithms do not use these features to its fullest. This paper discusses various optimization principles to be followed by encryption algorithm designers to exploit the features of underlying processor to the maximum. It also analyses the perfor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 2 publications
0
3
0
Order By: Relevance
“…The buffer can be small but should be big enough to ensure the whole loop is cached during the first sync loop execution. After all threads arrive at the AP, the buffered instructions are transferred to the cache and the freed buffer entries are refilled with new instructions by prefetching; Both the instruction transfer and buffer refill can be performed in parallel and this parallel processing also helps to reduce the buffer size 2 . When the loop is fully cached, the cache segment related to the loop is locked.…”
Section: Design Of Thread Synchronization For High Instruction Cache Localitymentioning
confidence: 99%
“…The buffer can be small but should be big enough to ensure the whole loop is cached during the first sync loop execution. After all threads arrive at the AP, the buffered instructions are transferred to the cache and the freed buffer entries are refilled with new instructions by prefetching; Both the instruction transfer and buffer refill can be performed in parallel and this parallel processing also helps to reduce the buffer size 2 . When the loop is fully cached, the cache segment related to the loop is locked.…”
Section: Design Of Thread Synchronization For High Instruction Cache Localitymentioning
confidence: 99%
“…We target applications that offer embarrassing parallelism, namely the same code is executed by a number of independent threads on different data sets. Such applications can be found in real-world computing problems such as encryption [2], scientific calculations [3], multimedia processing [4] and image processing on large data [5]. Those large computing problems demand designs of multiprocessor systems that can be built on small building block processors like the one we discuss in this paper.…”
Section: Introductionmentioning
confidence: 98%
“…In Ex (Execute Stage) the functional units process the task and store its result back into a record file. Finally, the commit stage retires instructions from the ROB in program order [20][21][22][23][24]. This processing flowchart is comparative to the one designed by the SimpleScalar tool set [8].…”
Section: Multi-processing and Pipelining During Simulationmentioning
confidence: 99%