A novel warp scheduling scheme considering long-latency operations for high-performance GPUs

Thuan, Cong; Choi, Hyung Do; Chung, Sung Woo; Kim, Cheol Hong

doi:10.1007/s11227-019-03091-2

Cited by 8 publications

(3 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides, some scheduling algorithms are specifically designed for the long operation latency problem on GPUs, such as the LPI [13] and the Long-Latency Operation-Based Scheduling (LLOS) [14] algorithms.…”

Section: Typical Warp Scheduling Algorithmsmentioning

confidence: 99%

“…All these approaches reduce the numbers of long operations as much as possible but do not directly address the pipeline stalling problem caused by long operation delays [11]. In literatures [12]- [14], a series of studies have been conducted on how to better hide the latency of long operations on GPUs. The literatures [15]- [19] have tried to dynamically choose the best warp scheduling strategy for different applications.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs

LIU

Zhao

et al. 2023

IEICE Trans. Fundamentals

View full text Add to dashboard Cite

GPUs have become the dominant computing units to meet the need of high performance in various computational fields. But the long operation latency causes the underutilization of on-chip computing resources, resulting in performance degradation when running parallel tasks on GPUs. A good warp scheduling strategy is an effective solution to hide latency and improve resource utilization. However, most current warp scheduling algorithms on GPUs ignore the ability of long operations to hide latency. In this paper, we propose a long-operation-first warp scheduling algorithm, LFWS, for GPU platforms. The LFWS filters warps in the ready state to a ready queue and updates the queue in time according to changes in the status of the warp. The LFWS divides the warps in the ready queue into long and short operation groups based on the type of operations in their instruction buffers, and it gives higher priority to the long-operating warp in the ready queue. This can effectively use the long operations to hide some of the latency from each other and enhance the system's ability to hide the latency. To verify the effectiveness of the LFWS, we implement the LFWS algorithm on a simulation platform GPGPU-Sim. Experiments are conducted over various CUDA applications to evaluate the performance of LFWS algorithm, compared with other five warp scheduling algorithms. The results show that the LFWS algorithm achieves an average performance improvement of 8.01% and 5.09%, respectively, over three traditional and two novel warp scheduling algorithms, effectively improving computational resource utilization on GPU.

show abstract

Section: Typical Warp Scheduling Algorithmsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs

LIU

Zhao

et al. 2023

IEICE Trans. Fundamentals

View full text Add to dashboard Cite

show abstract

“…Sethia et al [28] proposed MASCAR, which uses greedy scheduling techniques to detect memory saturation and limit the warp for sending memory requests at a short period of time. In order to improve the latency hiding ability, Do et al [29] proposed a long-latency operation-based warp scheduler to improve GPU performance. Liang et al [30] proposed coordinated static and dynamic cache bypassing.…”

Section: Related Workmentioning

confidence: 99%

Locality-Based Cache Management and Warp Scheduling for Reducing Cache Contention in GPU

2021

View full text Add to dashboard Cite

GPGPUs has gradually become a mainstream acceleration component in high-performance computing. The long latency of memory operations is the bottleneck of GPU performance. In the GPU, multiple threads are divided into one warp for scheduling and execution. The L1 data caches have little capacity, while multiple warps share one small cache. That makes the cache suffer a large amount of cache contention and pipeline stall. We propose Locality-Based Cache Management (LCM), combined with the Locality-Based Warp Scheduling (LWS), to reduce cache contention and improve GPU performance. Each load instruction can be divided into three types according to locality: only used once as streaming data locality, accessed multiple times in the same warp as intra-warp locality, and accessed in different warps as inter-warp data locality. According to the locality of the load instruction, LWS applies cache bypass to the streaming locality request to improve the cache utilization rate, extend inter-warp memory request coalescing to make full use of the inter-warp locality, and combine with the LWS to alleviate cache contention. LCM and LWS can effectively improve cache performance, thereby improving overall GPU performance. Through experimental evaluation, our LCM and LWS can obtain an average performance improvement of 26% over baseline GPU.

show abstract

OKCM: improving parallel task scheduling in high-performance computing systems using online learning

Zhang

Han

et al. 2020

J Supercomput

View full text Add to dashboard Cite

A novel warp scheduling scheme considering long-latency operations for high-performance GPUs

Cited by 8 publications

References 35 publications

LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs

LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs

Locality-Based Cache Management and Warp Scheduling for Reducing Cache Contention in GPU

OKCM: improving parallel task scheduling in high-performance computing systems using online learning

Contact Info

Product

Resources

About