12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
DOI: 10.1109/fccm.2004.21
|View full text |Cite
|
Sign up to set email alerts
|

Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
89
0

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 113 publications
(90 citation statements)
references
References 14 publications
1
89
0
Order By: Relevance
“…The small matrix multiplies are implemented with an array of multiplyaccumulates (MACCs), as described for large matrix multiplies in [Underwood and Hemmert 2004]. In principle, the DGEMM operation is compute bound, since it performs 2N 3 operations over only 3N 2 data, with 4N 2 memory operations.…”
Section: Dense Matrix Multiplymentioning
confidence: 99%
See 1 more Smart Citation
“…The small matrix multiplies are implemented with an array of multiplyaccumulates (MACCs), as described for large matrix multiplies in [Underwood and Hemmert 2004]. In principle, the DGEMM operation is compute bound, since it performs 2N 3 operations over only 3N 2 data, with 4N 2 memory operations.…”
Section: Dense Matrix Multiplymentioning
confidence: 99%
“…This led researchers to begin by focusing on kernel operations that are used in HPC and can be provided through a standard library interface. Operations from BLAS [Underwood and Hemmert 2004;Zhuo and Prasanna 2004;Dou et al 2005;Zhuo and Prasanna 2005a;Zhuo and Prasanna 2005b] to FFTs [Hemmert and Underwood 2005] to the sparse matrix operations at the core of an iterative solver [deLorimier and DeHon 2005;Zhuo and Prasanna 2005c] and even a full CG solver [Morris et al 2006] have been studied. The fundamental challenge for each of these efforts is the communications with the host.…”
Section: Introductionmentioning
confidence: 99%
“…In previous related work, Underwood has performed a study which compared the performance of dot-products in FPGAs and CPUs [6]. In this 2004 paper it was predicted that FPGA-based floating-point operations would overtake CPUs by at least an order of magnitude by 2009.…”
Section: Introductionmentioning
confidence: 99%
“…FPGAs are now able to provide high computational parallelism as well as I/O parallelism. They have become an attractive option to accelerate scientific applications [18,20].…”
Section: Introductionmentioning
confidence: 99%