2010 First International Conference on Networking and Computing 2010
DOI: 10.1109/ic-nc.2010.39
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Matrix-Matrix Multiplication Based on HPL with a GPU-Accelerated PC Cluster

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 6 publications
0
4
0
Order By: Relevance
“…We evaluate the energy savings of MM due to overclocking only, undervolting only and combination of overclocking and undervolting considering the power consumption and execution time. We used matrix multiplication application (cuBLAS-MM) as it is a key sub-routine for many scientific applications like HPL and ScaLAPACK [23] [24]. For instance, MM constitutes of more than 90% of the computation cost in HPL [23].…”
Section: Evaluation 41 Experimental Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluate the energy savings of MM due to overclocking only, undervolting only and combination of overclocking and undervolting considering the power consumption and execution time. We used matrix multiplication application (cuBLAS-MM) as it is a key sub-routine for many scientific applications like HPL and ScaLAPACK [23] [24]. For instance, MM constitutes of more than 90% of the computation cost in HPL [23].…”
Section: Evaluation 41 Experimental Setupmentioning
confidence: 99%
“…We used matrix multiplication application (cuBLAS-MM) as it is a key sub-routine for many scientific applications like HPL and ScaLAPACK [23] [24]. For instance, MM constitutes of more than 90% of the computation cost in HPL [23]. Our proposed method can easily be integrated into these applications to save considerable amount of energy.…”
Section: Evaluation 41 Experimental Setupmentioning
confidence: 99%
“…In addition, it gives an introduction to programming them using CUDA, NVIDIA's language for programming GPUs, a standard for programming heterogeneous systems including conventional CPUs and GPUs in [6] and [8].…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, it is important to overlap computation and communication. In our present work [1], we examined an efficient implementation of Linpack. Our approach is based on the Hybrid MPIOpenMP with thread-to-thread communication (Hybrid TC) model introduced by [9].…”
Section: Introductionmentioning
confidence: 99%