2015
DOI: 10.14778/2809974.2809979
|View full text |Cite
|
Sign up to set email alerts
|

Supporting scalable analytics with latency constraints

Abstract: Recently there has been a significant interest in building big data analytics systems that can handle both "big data" and "fast data". Our work is strongly motivated by recent real-world use cases that point to the need for a general, unified data processing framework to support analytical queries with different latency requirements. Toward this goal, we start with an analysis of existing big data systems to understand the causes of high latency. We then propose an extended architecture with mini-batches as gr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 32 publications
(23 citation statements)
references
References 33 publications
(47 reference statements)
0
19
0
Order By: Relevance
“…For each dataflow program, we model each user objective as a function over all tunable parameters of the runtime system. Learning such a model for each user objective and a specific cluster environment has the potential to adapt to different objectives, hardware, and software characteristics, while static models [2,3,6] often fail to adapt due to hard-coded function shapes and constants.…”
Section: Key Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…For each dataflow program, we model each user objective as a function over all tunable parameters of the runtime system. Learning such a model for each user objective and a specific cluster environment has the potential to adapt to different objectives, hardware, and software characteristics, while static models [2,3,6] often fail to adapt due to hard-coded function shapes and constants.…”
Section: Key Techniquesmentioning
confidence: 99%
“…(a) Big-Bench (TPCx-BB) for batch analytics includes 30 workloads, which can be divided into 14 SQL tasks, 11 SQL with UDFs and 5 ML workloads. (b) We also designed a new stream benchmark by extending previous workloads on click stream analysis [3] to include stream SQL queries, SQL+UDF queries, and machine learning tasks. As suggested by our industry collaborators, these workloads have been parameterized in different ways to control the similarities among workloads.…”
Section: Demonstrationmentioning
confidence: 99%
“…Twitter Heron [38] does user defined thread allocation and mapping by Aurora scheduler. In the paper [42] proposed an analytical model for resource allocation and dynamic mapping to meet latency requirement while maximizing throughput, for processing real time streams on hadoop. Stela [72] uses effective throughput percentage (ETP) as the metric to decide the task to be scaled when user requests scaling in/out with given number of machines.…”
Section: Scheduling For Dspsmentioning
confidence: 99%
“…An analytic model for MapReduce tasks to compute latencies is presented in [25]. [26] proposes a stochastic cost model for generic workflow tasks but does not consider different degrees of parallelism.…”
Section: Related Workmentioning
confidence: 99%