Vector Runahead

Naithani, Ajeya; Ainsworth, Sam; Jones, Timothy M.; Eeckhout, Lieven

doi:10.1109/isca52012.2021.00024

Cited by 8 publications

(2 citation statements)

References 91 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Optimizing Irregular Memory Accesses Recent work has made significant strides in domain-agnostic prefetching for irregular applications [10,43,57]. Our split-tree structure can be seen as an application-specific prefetcher and achieves "perfect prefetching" in that 1) off-chip data accesses are overlapped with computation, 2) data needed by the accelerator are readily available on-chip without stalls, and 3) no redundant DRAM accesses are needed.…”

Section: Related Workmentioning

confidence: 99%

Crescent: Taming Memory Irregularities for Accelerating Deep Point Cloud Analytics

Ye¹,

Hammonds²,

Gan³

et al. 2022

Preprint

View full text Add to dashboard Cite

3D perception in point clouds is transforming the perception ability of future intelligent machines. Point cloud algorithms, however, are plagued by irregular memory accesses, leading to massive inefficiencies in the memory sub-system, which bottlenecks the overall efficiency.This paper proposes Crescent, an algorithm-hardware co-design system that tames the irregularities in deep point cloud analytics while achieving high accuracy. To that end, we introduce two approximation techniques, approximate neighbor search and selectively bank conflict elision, that "regularize" the DRAM and SRAM memory accesses. Doing so, however, necessarily introduces accuracy loss, which we mitigate by a new network training procedure that integrates approximation into the network training process. In essence, our training procedure trains models that are conditioned upon a specific approximate setting and, thus, retain a high accuracy. Experiments show that Crescent doubles the performance and halves the energy consumption compared to an optimized baseline accelerator with < 1% accuracy loss.

show abstract

Section: Related Workmentioning

confidence: 99%

Crescent: Taming Memory Irregularities for Accelerating Deep Point Cloud Analytics

Ye¹,

Hammonds²,

Gan³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We divide the traditional prefetching algorithms into three broad categories: precomputation-based, temporal, and spatial. Precomputation-based prefetchers (e.g., runahead [46,59,60,63,[95][96][97][98][99][100][101] and helper-thread execution [33,40,41,45,73,74,87,106,119,128,141,144]) pre-execute program code to generate future memory requests. These prefetchers can generate highly-accurate prefetches even when no recognizable pattern exists in program memory requests.…”

Section: Other Related Workmentioning

confidence: 99%

Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning

Bera

Kanellopoulos

Nori

et al. 2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

Past research has proposed numerous hardware prefetching techniques, most of which rely on exploiting one specific type of program context information (e.g., program counter, cacheline address, or delta between cacheline addresses) to predict future memory accesses. These techniques either completely neglect a prefetcher's undesirable effects (e.g., memory bandwidth usage) on the overall system, or incorporate system-level feedback as an afterthought to a system-unaware prefetch algorithm. We show that prior prefetchers often lose their performance benefit over a wide range of workloads and system configurations due to their inherent inability to take multiple different types of program context and system-level feedback information into account while prefetching. In this paper, we make a case for designing a holistic prefetch algorithm that learns to prefetch using multiple different types of program context and system-level feedback information inherent to its design.To this end, we propose Pythia, which formulates the prefetcher as a reinforcement learning agent. For every demand request, Pythia observes multiple different types of program context information to make a prefetch decision. For every prefetch decision, Pythia receives a numerical reward that evaluates prefetch quality under the current memory bandwidth usage. Pythia uses this reward to reinforce the correlation between program context information and prefetch decision to generate highly accurate, timely, and systemaware prefetch requests in the future. Our extensive evaluations using simulation and hardware synthesis show that Pythia outperforms two state-of-the-art prefetchers (MLOP and Bingo) by 3.4% and 3.8% in single-core, 7.7% and 9.6% in twelve-core, and 16.9% and 20.2% in bandwidth-constrained core configurations, while incurring only 1.03% area overhead over a desktop-class processor and no software changes in workloads. The source code of Pythia can be freely downloaded from https://github.com/CMU-SAFARI/Pythia.

show abstract

Reliability-Aware Runahead

Naithani

Eeckhout

2022

2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

Vector Runahead

Cited by 8 publications

References 91 publications

Crescent: Taming Memory Irregularities for Accelerating Deep Point Cloud Analytics

Crescent: Taming Memory Irregularities for Accelerating Deep Point Cloud Analytics

Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning

Reliability-Aware Runahead

Contact Info

Product

Resources

About