2016
DOI: 10.48550/arxiv.1610.06385
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accelerating BLAS on Custom Architecture through Algorithm-Architecture Co-design

Farhad Merchant,
Tarun Vatwani,
Anupam Chattopadhyay
et al.

Abstract: Basic Linear Algebra Subprograms (BLAS) play key role in high performance and scientific computing applications. Experimentally, yesteryear multicore and General Purpose Graphics Processing Units (GPGPUs) are capable of achieving up to 15 to 57% of the theoretical peak performance at 65W to 240W respectively for compute bound operations like Double/Single Precision General Matrix Multiplication (XGEMM). For bandwidth bound operations like Single/Double precision Matrix-vector Multiplication (XGEMV) the perform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 25 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?