Proceedings of the 28th ACM International Conference on Supercomputing 2014
DOI: 10.1145/2597652.2597670
|View full text |Cite
|
Sign up to set email alerts
|

Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores

Abstract: While the growing number of cores per chip allows researchers to solve larger scientific and engineering problems, the parallel efficiency of the deployed parallel software starts to decrease. This unscalability problem happens to both vendorprovided and open-source software and wastes CPU cycles and energy. By expecting CPUs with hundreds of cores to be imminent, we have designed a new framework to perform matrix computations for massively many cores. Our performance analysis on manycore systems shows that th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…Even if the number of arithmetic operations is reduced by 100×, the overhead of lookups and cache misses is so dominant that switching to sparse matrices would not pay off. The gap is widened even further by the use of steadily improving, highly tuned, numerical libraries that allow for extremely fast dense matrix multiplication, exploiting the minute details of the underlying CPU or GPU hardware [16,9]. Also, non-uniform sparse models require more sophisticated engineering and computing infrastructure.…”
Section: Motivation and High Level Considerationsmentioning
confidence: 99%
“…Even if the number of arithmetic operations is reduced by 100×, the overhead of lookups and cache misses is so dominant that switching to sparse matrices would not pay off. The gap is widened even further by the use of steadily improving, highly tuned, numerical libraries that allow for extremely fast dense matrix multiplication, exploiting the minute details of the underlying CPU or GPU hardware [16,9]. Also, non-uniform sparse models require more sophisticated engineering and computing infrastructure.…”
Section: Motivation and High Level Considerationsmentioning
confidence: 99%
“…Even if the number of arithmetic operations is reduced by 100×, the overhead of lookups and cache misses would dominate: switching to sparse matrices might not pay off. The gap is widened yet further by the use of steadily improving and highly tuned numerical libraries that allow for extremely fast dense matrix multiplication, exploiting the minute details of the underlying CPU or GPU hardware [16,9]. Also, non-uniform sparse models require more sophisticated engineering and computing infrastructure.…”
Section: Motivation and High Level Considerationsmentioning
confidence: 99%
“…The intra-region is composed of multiple core groups through intra-region interconnection interfaces. This architecture aims to make full use of the locality of programs and features high performance, high scalability, and flexible physical implementation [3] .…”
Section: Proprietary Cpu: Matrix-2000+mentioning
confidence: 99%