FFTW: an adaptive software architecture for the FFT

Frigo, Matteo; Johnson, Steven G.

doi:10.1109/icassp.1998.681704

Cited by 1,327 publications

(1,032 citation statements)

References 16 publications

Supporting

Mentioning

1,020

Contrasting

Unclassified

Order By: Relevance

“…All algorithms were implemented in C and tested on an AMD Athlon TM XP 2700+ with 2GB main memory, SuSe-Linux (kernel 2.6.5-7.151-default, gcc 3.3.5) using double precision arithmetic. Moreover, we have used the libraries FFTW 3.0.1 [6] and the NFFT 3.0 library [8], now including the fast NFSFT algorithms. Throughout our experiments we have applied the NFFT routines with precomputed Kaiser-Bessel functions and an oversampling factor of two.…”

Section: Examplesmentioning

confidence: 99%

Fast evaluation of quadrature formulae on the sphere

Keiner

Potts

2008

Math. Comp.

View full text Add to dashboard Cite

Abstract. Recently, a fast approximate algorithm for the evaluation of expansions in terms of standard L 2 S 2 -orthonormal spherical harmonics at arbitrary nodes on the sphere S 2 has been proposed in [S. Kunis and D. Potts. Fast spherical Fourier algorithms. J. Comput. Appl. Math., 161:75-98, 2003]. The aim of this paper is to develop a new fast algorithm for the adjoint problem which can be used to compute expansion coefficients from sampled data by means of quadrature rules.We give a formulation in matrix-vector notation and an explicit factorisation of the spherical Fourier matrix based on the former algorithm. Starting from this, we obtain the corresponding factorisation of the adjoint spherical Fourier matrix and are able to describe the associated algorithm for the adjoint transformation which can be employed to evaluate quadrature rules for arbitrary weights and nodes on the sphere. We provide results of numerical tests showing the stability of the obtained algorithm using as examples classical Gauß-Legendre and Clenshaw-Curtis quadrature rules as well as the HEALPix pixelation scheme and an equidistribution.

show abstract

Section: Examplesmentioning

confidence: 99%

Fast evaluation of quadrature formulae on the sphere

Keiner

Potts

2008

Math. Comp.

View full text Add to dashboard Cite

show abstract

“…Current limited prototypes for dense matrixmultiplication (ATLAS [26] and PHIPAC [5]) sparse matrix-vector-multiplication (Sparsity [17,16], and FFTs (FFTW [13,12]) show that we can frequently do as well as or even better than hand-tuned vendor code on the kernels attempted.…”

Section: Librariesmentioning

confidence: 99%

Self-Adapting Numerical Software and Automatic Tuning of Heuristics

Dongarra¹,

Eijkhout²

2003

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Self-Adapting Numerical Software (SANS) systems aim to bridge the knowledge gap that exists between the expertise of domain scientists, and the know-how that is needed to fulfill efficiently their computational demands. This know-how extends to algorith choice, computational grid utilization, and use of properly optimized kernels. A SANS system is a piece of meta software that mediates between the application program and the computational platform so that application scientists -with disparate levels of knowledge of algorithmic and programmatic complexities of the underlying numerical software -can easily realize numerical solvers and efficiently solve their problem.The main component of a SANS system is an Intelligent Agent that automates method selection based on data, algorithm and system attributes. The IA uses heuristics to make its decisions. In this paper we explain how the heuristics of the IA can be tuned over time by redundant testing and using the nature of many applications. IntroductionIn numerous technologically important areas, such as aerodynamics (vehicle design), electrodynamics (semiconductor device design), magnetohydrodynamics (fusion energy device design), and porous media (petroleum recovery), production runs on expensive, high-end systems last for hours or days, and a major portion of the execution time is usually spent inside of numerical routines, such as for the solution of large-scale nonlinear and linear systems that derive from discretized systems of nonlinear partial differential equations. These numerical parts of the code can contain a large number of tuning parameters, the choice of which greatly influences the efficiency of the total code, or can even make the difference between obtaining a solution and obtaining none.Such numerical concerns, however, are artifactual from the perspective of the scientific and engineering users, who are usually more concerned with modeling and discretization issues. The classic response to numerics was to encode the requisite mathematical, algorithmic and programming expertise into libraries that could be easily reused by a broad spectrum of domain scientists. However, in high-performance computing this solution is no longer sufficient. There is typically more than one algorithm for the stated purpose, and since several levels of algorithms are needed in a large-scale application; the different algorithms can have interlocking parameter settings, and the availability of parallel computing platforms influences algorithmic decisions. Since the difference in performance between an optimal choice of algorithm and hardware, and a less than optimal one, can span orders of magnitude, it is unfortunate that selecting the right solution strategy requires specialized knowledge of both numerical analysis and of computing platform characteristics. Our SANS system aims to assist the application user to navigate this maze of computational possibilities. *

show abstract

“…The one-dimensional, Q-point FFTs in steps 1 and 5 of the PCFFT are computed with the 1-D FFTW [4] package. Table 2 compares the execution times in seconds of the 3-D FFTW [4] and our parallel crystallographic FFT. We also computed the speed up ratios between the 3-D FFTW and the PCFFT.…”

Section: Computer Experimentsmentioning

confidence: 99%

“…The less classical but more common speed-up definition, which is the ratio between the execution times of the parallel method in P processors and the parallel method in one processor, is meaningless in our case, since the PCFFT in one processor is on average, far less efficient than the 3-D FFTW. We also compared the PCFFT run times with those of the parallel MPI-FFTW [4]. It turns out, however, that due to the load unbalancing induced by the irregularity of the problem, and above all, the large data array transpositions and communications that were required, the MPI-FFTW performed very poorly in our system.…”

Section: Computer Experimentsmentioning

confidence: 99%

A Parallel Prime Edge-Length Crystallographic FFT

Seguel

Burbano

2003

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Although other methods are available, computational X-ray crystallography is still the most accurate way of determining the atomic structure of crystals. For large scale problems such as protein or virus structure determination, a huge amount of three-dimensional discrete Fourier transforms (DFT) conform the core computation in these methods. Despite the fact that highly efficient fast Fourier transform (FFT) implementations are available, significant improvements can be obtained by using FFT variants tailored to crystal structure calculations. These variants, or crystallographic FFTs, use a-priori knowledge of the specimen's crystal symmetries to lower the operation count and storage requirement of a usual, asymmetric FFT. The design and implementation of crystallographic FFTs brings about several problems of its own. And, as is usually the case with prime length FFTs, prime edge-length crystallographic FFTs pose the hardest challenges among them. This paper develops and tests a parallel multidimensional crystallographic FFT of prime edge-length, whose performance is significantly better than that of the usual FFT.

show abstract

FFTW: an adaptive software architecture for the FFT

Cited by 1,327 publications

References 16 publications

Fast evaluation of quadrature formulae on the sphere

Fast evaluation of quadrature formulae on the sphere

Self-Adapting Numerical Software and Automatic Tuning of Heuristics

A Parallel Prime Edge-Length Crystallographic FFT

Contact Info

Product

Resources

About