SC20: International Conference for High Performance Computing, Networking, Storage and Analysis 2020
DOI: 10.1109/sc41405.2020.00062
|View full text |Cite
|
Sign up to set email alerts
|

Fast Stencil-Code Computation on a Wafer-Scale Processor

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 46 publications
(31 citation statements)
references
References 14 publications
0
29
0
Order By: Relevance
“…Recently, researchers at Argonne National Laboratory developed an AI-driven simulation framework for solving the same MD problem, yielding 50x speedup in time to solutions over the traditional HPC method [616]. And some work suggests these approaches need not be mutually exclusive: it has been shown in the context of computational fluid dynamics that traditional HPC workloads can be run alongside AI training to provide accelerated data-feed paths [617]. Co-locating workloads in this way may be necessary for petascale (approaching exascale) scientific simulations: Compute aside, supercomputers or large clusters (distributed compute nodes) are the only way to host some of the largest currently available models -as their memory requirements go into trillions of parameters, partitioning the model is the only way.…”
Section: Accelerated Computingmentioning
confidence: 99%
“…Recently, researchers at Argonne National Laboratory developed an AI-driven simulation framework for solving the same MD problem, yielding 50x speedup in time to solutions over the traditional HPC method [616]. And some work suggests these approaches need not be mutually exclusive: it has been shown in the context of computational fluid dynamics that traditional HPC workloads can be run alongside AI training to provide accelerated data-feed paths [617]. Co-locating workloads in this way may be necessary for petascale (approaching exascale) scientific simulations: Compute aside, supercomputers or large clusters (distributed compute nodes) are the only way to host some of the largest currently available models -as their memory requirements go into trillions of parameters, partitioning the model is the only way.…”
Section: Accelerated Computingmentioning
confidence: 99%
“…5, for some source -destination pairs, there may be several alternative shortest path vectors each of which specifies its own subset of the reserve shortest paths. The choice of the most promising vector (from the point of view of providing the largest number of reserve paths when moving along the shortest path) is determined on the basis of Lemma 4 derived from (1) , but the first one has a lower denominator provided Q.E.D. This implies the possible strategy of choosing when routing a packet, such a coordinate of the shortest path vector, decreasing which by 1 preserves the greatest number of reserve paths when moving along the shortest path.…”
Section: Lemma 2 For Any Pair Source -Destination With Numbersmentioning
confidence: 99%
“…The development of multiprocessor systems-on-chip (MPSoCs) has become a ubiquitous trend and has led to the fact that modern chips can accommodate tens, hundreds or even thousands of processor cores. So, the latest versions of the WSE2 chip from Cerebras can contain up to 850,000 computing cores [1,2], and the project from Esperanto technologies promises 1088 energy-efficient ET-Minion 64-bit RISC-V each with a vector/tensor unit in ET-SoC-1 chip [3]. The operation of such large MPSoCs is not possible without a high-performance communication subsystem, the tasks of which are currently performed by the network-on-chip (NoC).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, matrix dimension sizes span from single digits to millions while matrix sparsity spans from ∼ 10 −5 % dense to fully dense [9]. The vast amount of workloads has led to many accelerator architecture proposals, as they achieve higher throughput than CPUs, and higher energy efficiency than GPUs [21], [46], [48] performs well for workloads of high unstructured sparsity, but not for dense computations due to its sparse controller overhead. Large datacenters require flexibility, as in they must have the compute and memory resources to perform all current and future workloads efficiently.…”
Section: Introductionmentioning
confidence: 99%