2013
DOI: 10.1587/transinf.e96.d.2319
|View full text |Cite
|
Sign up to set email alerts
|

Auto-Tuning of Thread Assignment for Matrix-Vector Multiplication on GPUs

Abstract: SUMMARYModern GPUs have evolved to become a more general processor capable of executing scientific and engineering computations. It provides a highly parallel computing environment due to its large number of computing cores, which are suitable for numerous data parallel arithmetic computations, particularly linear algebra operations. The matrixvector multiplication is one of the most important dense linear algebraic operations. It is applied to a diverse set of applications in many fields and must therefore be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…Thus, when the matrix is very wide, each thread conducts a large number of calculations and the performance will observably reduce. Therefore, we designed a novel autotuning method for matrix-vector multiplication on GPUs, where the number of threads used to compute one element of the result vector can be autotuned according to the matrix size [ 29 ]. For very wide matrices, thousands of threads are used to compute one element of the result vector.…”
Section: Design Of Parallel Aam Fitting Algorithm For Gpusmentioning
confidence: 99%
“…Thus, when the matrix is very wide, each thread conducts a large number of calculations and the performance will observably reduce. Therefore, we designed a novel autotuning method for matrix-vector multiplication on GPUs, where the number of threads used to compute one element of the result vector can be autotuned according to the matrix size [ 29 ]. For very wide matrices, thousands of threads are used to compute one element of the result vector.…”
Section: Design Of Parallel Aam Fitting Algorithm For Gpusmentioning
confidence: 99%
“…GPU on the other hand is capable of executing more GFLOPS than normal CPU. It provides a highly parallel computing environment suitable for numerous data parallel arithmetic computations such as dense linear algebraic operations 13 . However, the only restriction in earlier GPU version lies in its lack of support for IEEE FP Standards 12 .…”
Section: Gpu and Cudamentioning
confidence: 99%