Proceedings of the Seventh ACM Symposium on Cloud Computing 2016
DOI: 10.1145/2987550.2987563
|View full text |Cite
|
Sign up to set email alerts
|

Job-aware Scheduling in Eagle

Abstract: We present Eagle, a new hybrid data center scheduler for data-parallel programs. Eagle dynamically divides the nodes of the data center in partitions for the execution of long and short jobs, thereby avoiding head-of-line blocking. Furthermore, it provides job awareness and avoids stragglers by a new technique, called Sticky Batch Probing (SBP).The dynamic partitioning of the data center nodes is accomplished by a technique called Succinct State Sharing (SSS), in which the distributed schedulers are informed o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 69 publications
(14 citation statements)
references
References 23 publications
0
5
0
Order By: Relevance
“…The problem of scheduling SRAs has been widely addressed in the literature, leveraging task reordering techniques to prevent head of line blocking [23], [24], and introducing also task bandwidth requirements to cope with the most network demanding tasks [25]- [27]. Still, inaccurate estimates of job completion time can be difficult to mitigate due to external factors such as data size, network congestion, and resource contention which make expected completion time highly variable.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The problem of scheduling SRAs has been widely addressed in the literature, leveraging task reordering techniques to prevent head of line blocking [23], [24], and introducing also task bandwidth requirements to cope with the most network demanding tasks [25]- [27]. Still, inaccurate estimates of job completion time can be difficult to mitigate due to external factors such as data size, network congestion, and resource contention which make expected completion time highly variable.…”
Section: Related Workmentioning
confidence: 99%
“…The problem has been however widely addressed in the context of data centers since public Cloud computing has emerged as the most promising solution to host companies' IT services. A simple and flexible family of algorithms handles the problem one job per time, i.e., each unscheduled job is first retrieved from a queue and then assigned to a computation unit regardless of the other jobs that are still in the queue [23], [28]. This approach has the limitation of committing early to suboptimal decisions that can prevent the placement of subsequent jobs.…”
Section: Related Workmentioning
confidence: 99%
“…In order to overcome the above limitations, the hybrid cloud scheduling models [11], [10], [23] were developed as the combinations of both centralized and distributed models in the way presented in Figure 6. The hybrid scheduler delivers centralized high-quality scheduling decisions for long-running resource-demanding jobs and sub-optimal fast scheduling decisions for short and latency-sensitive jobs.…”
Section: Distributed Cloud Schedulers Distributed Schedulersmentioning
confidence: 99%
“…One of the practical issues in online scheduling is the estimation of job processing times which are usually unknown. Delgado et al [18] proposed a hybrid scheduler that was tested under Google trace dataset and showed high resistance to misestimation of task duration. A number of papers [15,19] analyze publicly available trace datasets from Google or Microsoft and conclude there is a lot of repetitiveness of tasks (over 60% of tasks are recurring).…”
Section: Introductionmentioning
confidence: 99%
“…One of the most effective algorithms for minimizing total flowtime of jobs is the Shortest Remaining Processing Time (SRPT) algorithm, for which strong bounds were proven [20]. The SRPT algorithm was thus used in several papers [6,15,18]. Research of lower and upper bounds and competitive ratio is common with online algorithms.…”
Section: Introductionmentioning
confidence: 99%