2010
DOI: 10.1007/978-3-642-15291-7_29
|View full text |Cite
|
Sign up to set email alerts
|

Optimized Dense Matrix Multiplication on a Many-Core Architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
3
2
2

Relationship

4
3

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 15 publications
0
20
0
Order By: Relevance
“…The limitation of memory bandwidth on many-core applications has been addressed to other levels in the memory hierarchy for some linear algebra applications [10,3]. They have proposed alternatives to find optimum tiling to the register level and mechanism for hiding memory latency.…”
Section: Introductionmentioning
confidence: 99%
“…The limitation of memory bandwidth on many-core applications has been addressed to other levels in the memory hierarchy for some linear algebra applications [10,3]. They have proposed alternatives to find optimum tiling to the register level and mechanism for hiding memory latency.…”
Section: Introductionmentioning
confidence: 99%
“…There are several works optimizing MMM for many cores [53][54][55][56][57][58][59][60][61][62][63][64]. The fastest implementations are given in [23] where MMM is parallelized on Intel Xeon Phi and on IBM Blue Gene/Q; an analysis is made on which loop is going to be parallelized.…”
Section: Related Workmentioning
confidence: 99%
“…In general, fine grain execution is useful only when the overhead associated with the execution is acceptable. In contrast, coarse-grained executions decrease the proportional overhead of task management at the cost of reducing parallelism and reducing the opportunities for load balancing in many-core systems [9].…”
Section: Motivationmentioning
confidence: 99%