A Performance Prediction Model for Memory-Intensive GPU Kernels

Hu, Zhidan; Liu, Guangming

doi:10.1109/scac.2014.10

Cited by 3 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also, array access affects how memory is accessed, which is critical for performance diagnosis [44]. The performance of memory in GPUs relies on the memory access pattern that describes how data indexes are referred to by consecutive threads within a warp [33]. For example, coalesced memory access improves the performance of a program as a warp will not access multiple memory transactions [33].…”

Section: Source Code Analysismentioning

confidence: 99%

“…The performance of memory in GPUs relies on the memory access pattern that describes how data indexes are referred to by consecutive threads within a warp [33]. For example, coalesced memory access improves the performance of a program as a warp will not access multiple memory transactions [33]. Also, if the cache is not aligned in a CUDA program, it affects the performance by 32 times more than cache-aligned [54].…”

Section: Source Code Analysismentioning

confidence: 99%

See 1 more Smart Citation

Parallelizing unit test execution on GPU

Bagies¹

View full text Add to dashboard Cite

show abstract

Section: Source Code Analysismentioning

confidence: 99%

Section: Source Code Analysismentioning

confidence: 99%

Parallelizing unit test execution on GPU

Bagies¹

View full text Add to dashboard Cite

show abstract

Ai BCS: A GPU cluster scheduling optimization based on SKE model

Liu

Chen

et al. 2016

Microprocessors and Microsystems

View full text Add to dashboard Cite

Corun: Concurrent Inference and Continuous Training at the Edge for Cost-Efficient AI-Based Mobile Image Sensing

Liu,

Andhare,

Kang

2024

Sensors

View full text Add to dashboard Cite

Intelligent mobile image sensing powered by deep learning analyzes images captured by cameras from mobile devices, such as smartphones or smartwatches. It supports numerous mobile applications, such as image classification, face recognition, and camera scene detection. Unfortunately, mobile devices often lack the resources necessary for deep learning, leading to increased inference latency and rapid battery consumption. Moreover, the inference accuracy may decline over time due to potential data drift. To address these issues, we introduce a new cost-efficient framework, called Corun, designed to simultaneously handle multiple inference queries and continual model retraining/fine-tuning of a pre-trained model on a single commodity GPU in an edge server to significantly improve the inference throughput, upholding the inference accuracy. The scheduling method of Corun undertakes offline profiling to find the maximum number of concurrent inferences that can be executed along with a retraining job on a single GPU without incurring an out-of-memory error or significantly increasing the latency. Our evaluation verifies the cost-effectiveness of Corun. The inference throughput provided by Corun scales with the number of concurrent inference queries. However, the latency of inference queries and the length of a retraining epoch increase at substantially lower rates. By concurrently processing multiple inference and retraining tasks on one GPU instead of using a separate GPU for each task, Corun could reduce the number of GPUs and cost required to deploy mobile image sensing applications based on deep learning at the edge.

show abstract

A Performance Prediction Model for Memory-Intensive GPU Kernels

Cited by 3 publications

References 12 publications

Parallelizing unit test execution on GPU

Parallelizing unit test execution on GPU

Ai BCS: A GPU cluster scheduling optimization based on SKE model

Corun: Concurrent Inference and Continuous Training at the Edge for Cost-Efficient AI-Based Mobile Image Sensing

Contact Info

Product

Resources

About