2011
DOI: 10.1016/j.procs.2011.04.036
|View full text |Cite
|
Sign up to set email alerts
|

Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 7 publications
0
11
0
Order By: Relevance
“…The MMM code for one core, is either given by Cilk tool [66] or by cblas sgemm routine of ATLAS. At last, [67] and [68] [76]. Reference [26] show how to modify the MAGMA GEMM kernels in order to use more efficient the Fermi architecture.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The MMM code for one core, is either given by Cilk tool [66] or by cblas sgemm routine of ATLAS. At last, [67] and [68] [76]. Reference [26] show how to modify the MAGMA GEMM kernels in order to use more efficient the Fermi architecture.…”
Section: Related Workmentioning
confidence: 99%
“…[75] provides theoretical analysis why performance drawbacks appear for specific problem sizes when using cache memories. Finally, in [76], different data arrays layouts are evaluated, such Z-Morton and X-Morton. All the above works, are empirical techniques and do not give a methodology.…”
Section: Related Workmentioning
confidence: 99%
“…The whole set of computations can be seen as a 3D cube where element (i, k, j) corresponds to the basic operation a i,k b k,j . At the notable exception of recently introduced 2.5D schemes [42], all implementations (see [43] for a recent survey), including those implemented with MapReduce [36], [27] or designed for GPUs [44] are based on the ScaLAPACK algorithm [45], that uses the outer product described in Section IV-A as building block. For the sake of simplicity, we will concentrate on the case of square matrices only.…”
Section: B 3d Data Distribution: Matrix Multiplicationmentioning
confidence: 99%
“…Since the appearance of CUDA programming, there is a big field of researches that have already carried out seeking better performances. This is the case of different computational cores, such as matrix multiplication [10], Boltzmann equation [11], or Parallel 3D fast wavelet transform [12].…”
Section: State Of Artmentioning
confidence: 99%