2011 17th International Conference on Digital Signal Processing (DSP) 2011
DOI: 10.1109/icdsp.2011.6004879
|View full text |Cite
|
Sign up to set email alerts
|

QTIB: Quick bit-reversed permutations on CPUs

Abstract: We present a fast algorithm for out-of-place bit-reversed permutation of large vectors for input to an FFT. It is an extension of two previously published methods with special consideration of advanced CPU hardware features. In particular, the method makes heavy use of cache prefetching, MMX and SSE units, and write-combining buffers. Implementations have been made in assembly language for 2-byte and 4-byte operands. In terms of efficiency the method significantly outperforms previously reported methods. INTRO… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 5 publications
0
1
0
Order By: Relevance
“…Cache-efficient index permutations The permutation of the indexes necessary as a preparatory step for efficient matrix multiplications can be very costly for large tensors, since it involves the reordering of virtually all entries of the tensors in memory; similar issues have been an area of study in other contexts [45][46][47]. In this section we describe our novel cache-efficient implementation of the permutation of tensor indexes.…”
Section: Implementation Of the Simulatormentioning
confidence: 99%
“…Cache-efficient index permutations The permutation of the indexes necessary as a preparatory step for efficient matrix multiplications can be very costly for large tensors, since it involves the reordering of virtually all entries of the tensors in memory; similar issues have been an area of study in other contexts [45][46][47]. In this section we describe our novel cache-efficient implementation of the permutation of tensor indexes.…”
Section: Implementation Of the Simulatormentioning
confidence: 99%