2012 IEEE 28th International Conference on Data Engineering Workshops 2012
DOI: 10.1109/icdew.2012.66
|View full text |Cite
|
Sign up to set email alerts
|

Same Queries, Different Data: Can We Predict Runtime Performance?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
24
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(25 citation statements)
references
References 6 publications
1
24
0
Order By: Relevance
“…It is impossible to evaluate HFSP in a real deployment and in the complete absence of estimation errors, since execution time of a given job in Hadoop varies at each run, according to complex and rather unpredictable system properties [12], [28]. To isolate the impact of errors on scheduling and sojourn time, we thus turn to our simulation results on SRVT which is a size-based scheduler with aging induced by fair sharing in virtual time.…”
Section: Estimation Errors and Sojourn Timesmentioning
confidence: 99%
See 1 more Smart Citation
“…It is impossible to evaluate HFSP in a real deployment and in the complete absence of estimation errors, since execution time of a given job in Hadoop varies at each run, according to complex and rather unpredictable system properties [12], [28]. To isolate the impact of errors on scheduling and sojourn time, we thus turn to our simulation results on SRVT which is a size-based scheduler with aging induced by fair sharing in virtual time.…”
Section: Estimation Errors and Sojourn Timesmentioning
confidence: 99%
“…Job Size Estimation: Various recent approaches [9]- [12] propose techniques to estimate query sizes in recurring jobs. Agarwal et al [11] report that recurring jobs are around 40% of all those running in Bing's production servers.…”
Section: Fairness and Qosmentioning
confidence: 99%
“…For running jobs we continue to refine our work estimates by extrapolating based on data from the completed tasks. All of this can be improved in the future, for example by incorporating the techniques in [19]. Better estimates should improve the quality of our FlowFlex scheduler.…”
Section: Cluster Experimentsmentioning
confidence: 99%
“…Our approach uses minimal statistics about the input datasets (e.g., tuple size and number of tuples), which are complemented with historical information about prior query executions (e.g., execution time). More details on the predictions module have been published previously [27].…”
Section: Flex Schedulermentioning
confidence: 99%