2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2019
DOI: 10.1109/hpca.2019.00036
|View full text |Cite
|
Sign up to set email alerts
|

Kelp: QoS for Accelerated Machine Learning Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 18 publications
(5 citation statements)
references
References 38 publications
0
5
0
Order By: Relevance
“…In addition, they do not investigate the performance of existing serverless systems and pure serverless model serving systems while only working with AWS Lambda. Zhu et al [36] introduced Kelp, a software runtime that strives to isolate high-priority accelerated ML tasks from memory resource interference. They argue that in using accelerated machine learning, contention on host resources can significantly impact the efficiency of the accelerator.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, they do not investigate the performance of existing serverless systems and pure serverless model serving systems while only working with AWS Lambda. Zhu et al [36] introduced Kelp, a software runtime that strives to isolate high-priority accelerated ML tasks from memory resource interference. They argue that in using accelerated machine learning, contention on host resources can significantly impact the efficiency of the accelerator.…”
Section: Related Workmentioning
confidence: 99%
“…Zhang et al [8] propose a runtime system that exploits the newly added spatial multitasking feature in a GPU and raises the accelerator utilization while achieving the latency targets for user-facing services. Zhu et al [9] propose a software runtime that isolates high-priority accelerated machine learning tasks from memory resource interference. The OS-level scheduling methods commonly implement their design on a nonpreemptive accelerator design according to the modern GPU's execution model, which unavoidably makes them ineffective in some cases (priority inversion problem caused by the long-running kernels with lower priority, etc.).…”
Section: Related Workmentioning
confidence: 99%
“…Today's private datacenters host a diverse range of dataintensive applications, including machine learning training [7,22,38,51,60,62], SQL queries [9,30,58], graph processing [21,39,41], and big-data analytics [47,52]. Many of these applications are distributed and leverage parallel frameworks, such as Hadoop, Spark, Flink, and TensorFlow [1,8,17,53,61].…”
Section: Introductionmentioning
confidence: 99%