2019 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE) 2019
DOI: 10.23919/date.2019.8715088
|View full text |Cite
|
Sign up to set email alerts
|

Coherently Attached Programmable Near-Memory Acceleration Platform and its application to Stencil Processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

5
3

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…More recently, the use of FPGAs to accelerate stencils has been proposed [9,36,37,44]. Augmenting general-purpose cores with specialized FPGA accelerators is a promising approach to enhance overall system performance.…”
Section: Related Workmentioning
confidence: 99%
“…More recently, the use of FPGAs to accelerate stencils has been proposed [9,36,37,44]. Augmenting general-purpose cores with specialized FPGA accelerators is a promising approach to enhance overall system performance.…”
Section: Related Workmentioning
confidence: 99%
“…We include as a competitor to NMC an NVIDIA V100, one of the latest GPU with 32GB of HBM2 memory at 900 GB/s, which uses similar technology to the NMC platform. As NMC systems we use a custom hardware design called Access Processor (AP) [14], which can be mapped on different FPGAs (DDR4 and HBM2). Differently from a classical general-purpose computer, where the access bandwidth and latency depend on a complex mixture of workload characteristics and the memory hierarchy, the Access Processor (AP) design comprises the socalled memory controller, which has the feature of enabling more control over the memory system and programming all the concurrently running data streams from/to the attached NMC accelerators (see Fig 5).…”
Section: A System In Usementioning
confidence: 99%
“…The AP provides fine-grained control to schedule the accesses to the DDR4 and HBM2 memory (see Fig. 10), the transfer of the data to and from the FPGAs internal SRAM (Block RAM and/or UltraRAM), and the processing of the data [14]. Because the various 1D FFTs (see Fig.…”
Section: B Offloading On Nmc Systemsmentioning
confidence: 99%
“…They are similar to the kernels used in other weather and climate models [97,125,177]. Their performance is dominated by memory-bound operations with unique irregular memory access patterns and low arithmetic intensity that often results in <10% sustained loating-point performance on current CPU-based systems [165].…”
Section: Introductionmentioning
confidence: 99%