A Cache Replacement Policy Using Adaptive Insertion and Re-reference Prediction

Zhang, Xi; Li, Chongmin; Wang, Haixia; Wang, Dongsheng

doi:10.1109/sbac-pad.2010.21

Cited by 11 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For replacement, we have tested random and RRIP [84]. Random replacement resulted in an surprisingly worse speedup, due to its inability to capture the locality.…”

Section: Sensitivity To Cache Configurationmentioning

confidence: 99%

Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators

Yoo¹,

Song²,

Lee³

et al. 2023

Preprint

View full text Add to dashboard Cite

Graph convolutional networks (GCNs) are becoming increasingly popular as they can process a wide variety of data formats that prior deep neural networks cannot easily support. One key challenge in designing hardware accelerators for GCNs is the vast size and randomness in their data access patterns which greatly reduces the effectiveness of the limited on-chip cache. Aimed at improving the effectiveness of the cache by mitigating the irregular data accesses, prior studies often employ the vertex tiling techniques used in traditional graph processing applications. While being effective at enhancing the cache efficiency, those approaches are often sensitive to the tiling configurations where the optimal setting heavily depends on target input datasets. Furthermore, the existing solutions require manual tuning through trial-and-error or rely on sub-optimal analytical models.In this paper, we propose Slice-and-Forge (SnF), an efficient hardware accelerator for GCNs which greatly improves the effectiveness of the limited on-chip cache. SnF chooses a tiling strategy named feature slicing that splits the features into vertical slices and processes them in the outermost loop of the execution. This particular choice results in a repetition of the identical computational patterns over irregular graph data over multiple rounds. Taking advantage of such repetitions, SnF dynamically tunes its tile size. Our experimental results reveal that SnF can achieve 1.73× higher performance in geomean compared to prior work on multi-engine settings, and 1.46× higher performance in geomean on small scale settings, without the need for off-line analyses.

show abstract

“…For replacement, we have tested random and RRIP [84]. Random replacement resulted in an surprisingly worse speedup, due to its inability to capture the locality.…”

Section: Sensitivity To Cache Configurationmentioning

confidence: 99%

Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators

Yoo¹,

Song²,

Lee³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…This results in improvement in performance of vpr and twolf for both TA-DRRIP and ACR. Since TA-DRRIP searches for victim from left to right, it is not aware of access recency [22]. This may lead to inaccurate re-reference interval prediction.…”

Section: Effect On System Performancementioning

confidence: 99%

“…Each access from the LLC access trace of an application is put into one of the five bins based on the reuse distance of the cache line that is being accessed. These five bins correspond to the following mutually exclusive integral ranges for the reuse distance values: [0-7]; [8][9][10][11][12][13][14][15]; [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]; [32-63] and [0≥64]] (implies no reuse or reuse distance larger than 63). Figure 2 provides the percentage of cache accesses in each of these categories.…”

Section: Introductionmentioning

confidence: 99%

Acr: Application Aware Cache Replacement for Shared Caches in Multi-Core Systems

Warrier¹

2019

IJCET

View full text Add to dashboard Cite

Modern multi-core systems allow concurrent execution of different applications on a single chip. Such multicores handle the large bandwidth requirement from the processing cores by employing multiple levels of caches with one or two levels of private caches along with a shared last-level cache (LLC). In shared LLC, when applications with varying access behavior compete with each other for space, conventional single core cache replacement techniques can significantly degrade the system performance. In such scenarios, we need an efficient replacement policy for reducing the off-chip memory traffic as well as contention for the memory bandwidth. This paper proposes a novel Application-aware Cache Replacement (ACR) policy for the shared LLC. ACR policy considers the memory access behavior of the applications during the process of victim selection to prevent victimizing a low access rate application by a high-access rate application. \textcolor{red}{ It dynamically tracks the maximum lifetime of cache lines in shared LLC for each concurrent application and helps in efficient utilization of the cache space. Experimental evaluation of ACR policy for 4-core systems, with 16-way set associative 4MB LLC, using SPEC CPU 2000 and 2006 benchmark suites shows a geometric mean speed-up of 8.7% over the least recently used (LRU) replacement policy. We show that the ACR policy performs better than recently proposed thread-aware dynamic re-reference interval prediction (TA-DRRIP) and protecting distance based (PDP) techniques for various 2-core, 4-core and 8-core workloads.

show abstract