2008
DOI: 10.1007/978-3-540-88643-3_5
|View full text |Cite
|
Sign up to set email alerts
|

How to Write Fast Numerical Code: A Small Introduction

Abstract: Abstract. The complexity of modern computing platforms has made it extremely difficult to write numerical code that achieves the best possible performance. Straightforward implementations based on algorithms that minimize the operations count often fall short in performance by at least one order of magnitude. This tutorial introduces the reader to a set of general techniques to improve the performance of numerical code, focusing on optimizations for the computer's memory hierarchy. Further, program generators … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2009
2009
2022
2022

Publication Types

Select...
4
2
1

Relationship

4
3

Authors

Journals

citations
Cited by 21 publications
(19 citation statements)
references
References 51 publications
0
19
0
Order By: Relevance
“…A tutorial for a recursive radix-4 FFT is given in [29], leading to about two pages of C code. Extension to vector and multicore platforms considerably increases the code size: for example, FFTW contains more than 200,000 lines of code.…”
Section: General-size Recursive Codementioning
confidence: 99%
See 1 more Smart Citation
“…A tutorial for a recursive radix-4 FFT is given in [29], leading to about two pages of C code. Extension to vector and multicore platforms considerably increases the code size: for example, FFTW contains more than 200,000 lines of code.…”
Section: General-size Recursive Codementioning
confidence: 99%
“…Equation (22) can be used as outermost recursion to enable multicore parallelization. The smaller DFTs are then expanded using the short vector Cooley-Tukey FFT (23) or the vector recursion (29) shown later in this section.…”
Section: Multicore Cooley-tukey Fftmentioning
confidence: 99%
“…We define the matrix-multiplication MMM m,k,n as an operator that consumes two matrices and produces one 2 :…”
Section: Operator Language: Kernels and Algorithmsmentioning
confidence: 99%
“…This strategy is chosen, for example, in the library FFTW 2.x and the code can be sketched as shown in Figure 2. A simplified description of performing this process by hand can be found in [13].…”
Section: Introductionmentioning
confidence: 99%