Rapid development of high performance floating-point pipelines for scientific simulation

Lienhart,; Kugel,; Manner,

doi:10.1109/ipdps.2006.1639439

Cited by 4 publications

(2 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, floatingpoint arithmetic, especially floating-point divide and square root, are difficult to design, and often the critical, performance limiting factors. In particularly, applications that require floating-point divide and square root include transcranial magnetic stimulation [Cret et al 2007], Molecular Dynamics (MD) simulations [Govindu et al 2005], Monte Carlo radiative heat transfer simulation · 16: 31 [Gokhale et al 2004], sparse matrix Jacobi solver , QR decomposition [Wang and Leeser 2007a], smoothed particle hydrodynamics method [Lienhart et al 2002], and gravity calculation for N-body simulation [Lienhart et al 2006], radiation dose calculation [Whitton et al 2006], optical flow algorithms for image stabilization [Etiemble et al 2005], and Least Mean Squares (LMS) and Maximum Likelihood (ML) for a space systems [Poznanovic 2004]. As a result, a floating-point library including floating-point divide and square root is very desirable.…”

Section: Discussionmentioning

confidence: 99%

VFloat

Xiao-jun¹,

Leeser

2010

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

Optimal reconfigurable hardware implementations may require the use of arbitrary floating-point formats that do not necessarily conform to IEEE specified sizes. We present a variable precision floating-point library (VFloat) that supports general floating-point formats including IEEE standard formats. Most previously published floating-point formats for use with reconfigurable hardware are subsets of our format. Custom datapaths with optimal bitwidths for each operation can be built using the variable precision hardware modules in the VFloat library, enabling a higher level of parallelism. The VFloat library includes three types of hardware modules for format control, arithmetic operations, and conversions between fixed-point and floating-point formats. The format conversions allow for hybrid fixed-and floating-point operations in a single design. This gives the designer control over a large number of design possibilities including format as well as number range within the same application. In this article, we give an overview of the components in the VFloat library and demonstrate their use in an implementation of the K-means clustering algorithm applied to multispectral satellite images.

show abstract

Section: Discussionmentioning

confidence: 99%

VFloat

Xiao-jun¹,

Leeser

2010

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

show abstract

“…This makes the pipeline correct-by-design, and it is very flexible to adjust the precision of operands or to introduce changes on the operations performed. A more complete description can be found at [28].…”

Section: Software Tools and Library Interfacesmentioning

confidence: 99%

Accelerating astrophysical particle simulations with programmable hardware (FPGA and GPU)

Spurzem

Berczik

Marcus

et al. 2009

Comp. Sci. Res. Dev.

Self Cite

View full text Add to dashboard Cite

In a previous paper we have shown that direct gravitational N-body simulations in astrophysics scale very well for moderately parallel supercomputers (order 10-100 nodes). The best balance between computation and communication is reached if the nodes are accelerated by special purpose hardware; in this paper we describe the implemen-R. Spurzem (u) · P. tation of particle based astrophysical simulation codes on new types of accelerator hardware (field programmable gate arrays, FPGA, and graphical processing units, GPU). In addition to direct gravitational N-body simulations we also use the algorithmically similar "smoothed particle hydrodynamics" method as test application; the algorithms are used for astrophysical problems as e.g. evolution of galactic nuclei with central black holes and gravitational wave generation, and star formation in galaxies and galactic nuclei. We present the code performance on a single node using different kinds of special hardware (traditional GRAPE, FPGA, and GPU) and some implementation aspects (e.g. accuracy). The results show that GPU hardware for real application codes is as fast as GRAPE, but for an order of magnitude lower price, and that FPGA is useful for acceleration of complex sequences of operations (like SPH). We discuss future prospects and new cluster computers built with new generations of FPGA and GPU cards.

show abstract