A memory scheduling strategy for eliminating memory access interference in heterogeneous system

Fang, Juan; Wang, Mengxuan; Wei, Zelin

doi:10.1007/s11227-019-03135-7

Cited by 14 publications

(10 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most of the GPU resources are underutilized while occupying a huge amount of shared memory space. This usage of a huge amount of memory calls for a conflict resolution in CPU and GPU memory access, which is investigated in very recent literature [32], [33], [34], and can be put into action using the data from this paper. In the Zotac system, which has a discrete GPU unlike the Xaviers, the CV kernels show much better performance and lower CPU bottleneck issue, however the Zotac platform is not viable for aerial robots.…”

Section: Discussionmentioning

confidence: 99%

“…Autonomous robots continue to suffer from strict limits on computation and power, especially for small aerial vehicles. From a resource management perspective, we therefore focus on profiling that can inform two primary techniques: (1) thread-level scheduling that prioritizes among multiple computational kernels competing for shared computational resources such as CPU, GPU, and memory [31,32,33,34]; and (2) the Dynamic Voltage and Frequency Scaling (DVFS) mechanism on modern processors that allows for finegrained frequency scaling and power management [35]. We aim to analyze each computational kernel for sustainability and predictability by showing their worst case behavior, average behavior, and the variation from them.…”

Section: Preliminariesmentioning

confidence: 99%

See 1 more Smart Citation

Towards Computational Awareness in Autonomous Robots: An Empirical Study of Computational Kernels

Sifat¹,

Bharmal²,

Zeng³

et al. 2021

Preprint

View full text Add to dashboard Cite

The potential impact of autonomous robots on everyday life is evident in emerging applications such as precision agriculture, search and rescue, and infrastructure inspection. However, such applications necessitate operation in unknown and unstructured environments with a broad and sophisticated set of objectives, all under strict computation and power limitations. We therefore argue that the computational kernels enabling robotic autonomy must be scheduled and optimized to guarantee timely and correct behavior, while allowing for reconfiguration of scheduling parameters at runtime. In this paper, we consider a necessary first step towards this goal of computational awareness in autonomous robots: an empirical study of a base set of computational kernels from the resource management perspective. Specifically, we conduct a datadriven study of the timing, power, and memory performance of kernels for localization and mapping, path planning, task allocation, depth estimation, and optical flow, across three embedded computing platforms. We profile and analyze these kernels to provide insight into scheduling and dynamic resource management for computation-aware autonomous robots. Notably, our results show that there is a correlation of kernel performance with a robot's operational environment, justifying the notion of computation-aware robots and why our work is a crucial step towards this goal.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Preliminariesmentioning

confidence: 99%

Towards Computational Awareness in Autonomous Robots: An Empirical Study of Computational Kernels

Sifat¹,

Bharmal²,

Zeng³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Temporal memory resource isolation: In 2020, Fang et al [42] proposed a memory access scheduling strategy to solve the problem of shared memory contention in systems where GPUs are used. The method consists of three steps: initially, the memory requests are separated into two queues in the memory controller, avoiding GPU memory access requests interfering with CPU requests.…”

Section: ) Proposal Focused On Achieving a Better Performancementioning

confidence: 99%

“…However, fixed-size partitions cause low cache utilization and consequently reduced performance. On the other hand, in dynamic partitioning techniques, the size of Scheduling algorithms Temporal Hardware HRT [69] Memory request throttling Temporal Software SRT/HRT [74]- [77] Predictable DRAM Controllers Predictable Controller Hardware SRT/HRT [78], [81], [82] Bank Partitioning Spatial Software AVG [60] Channel Partitioning Spatial Soft/Hard AVG [80] Page Policy Control Spatial Soft/Hard AVG [83] Page Policy Control Spatial None AVG [84] Bank Partitioning Spatial None AVG [85] Decoupled Direct Access Spatial Hardware AVG [42] Scheduling algorithms Temporal None AVG [86] Task allocation Temporal Software AVG the allocated cache partitions varies during runtime, giving high cache utilization and causing lower predictability [98]. Furthermore, cache partitioning techniques can be characterized as index-based partitioning and way-based partitioning based on the structure of a set-associative cache.…”

Section: B Cache Interferencementioning

confidence: 99%

A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore Platforms

et al. 2022

View full text Add to dashboard Cite

This survey reviews the scientific literature on techniques for reducing interference in real-time multicore systems, focusing on the approaches proposed between 2015 and 2020. It also presents proposals that use interference reduction techniques without considering the predictability issue. The survey highlights interference sources and categorizes proposals from the perspective of the shared resource. It covers techniques for reducing contentions in main memory, cache memory, a memory bus, and the integration of interference effects into schedulability analysis. Every section contains an overview of each proposal and an assessment of its advantages and disadvantages.INDEX TERMS Real-time systems, architecture, multicore, timing analysis, schedulability analysis, WCET, co-runner interference.

show abstract

“…Existing research uses multiple methods to manage GPU cache to solve the problem of the GPU memory subsystem and cache management efficiency. In addition to the above methods, there are warp throttling [32] and memory scheduling strategy [33].…”

Section: Related Workmentioning

confidence: 99%

Locality-Based Cache Management and Warp Scheduling for Reducing Cache Contention in GPU

2021

Self Cite

View full text Add to dashboard Cite

GPGPUs has gradually become a mainstream acceleration component in high-performance computing. The long latency of memory operations is the bottleneck of GPU performance. In the GPU, multiple threads are divided into one warp for scheduling and execution. The L1 data caches have little capacity, while multiple warps share one small cache. That makes the cache suffer a large amount of cache contention and pipeline stall. We propose Locality-Based Cache Management (LCM), combined with the Locality-Based Warp Scheduling (LWS), to reduce cache contention and improve GPU performance. Each load instruction can be divided into three types according to locality: only used once as streaming data locality, accessed multiple times in the same warp as intra-warp locality, and accessed in different warps as inter-warp data locality. According to the locality of the load instruction, LWS applies cache bypass to the streaming locality request to improve the cache utilization rate, extend inter-warp memory request coalescing to make full use of the inter-warp locality, and combine with the LWS to alleviate cache contention. LCM and LWS can effectively improve cache performance, thereby improving overall GPU performance. Through experimental evaluation, our LCM and LWS can obtain an average performance improvement of 26% over baseline GPU.

show abstract

A memory scheduling strategy for eliminating memory access interference in heterogeneous system

Cited by 14 publications

References 23 publications

Towards Computational Awareness in Autonomous Robots: An Empirical Study of Computational Kernels

Towards Computational Awareness in Autonomous Robots: An Empirical Study of Computational Kernels

A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore Platforms

Locality-Based Cache Management and Warp Scheduling for Reducing Cache Contention in GPU

Contact Info

Product

Resources

About