2019
DOI: 10.3390/a12020037
|View full text |Cite
|
Sign up to set email alerts
|

Stream Data Load Prediction for Resource Scaling Using Online Support Vector Regression

Abstract: A distributed data stream processing system handles real-time, changeable and sudden streaming data load. Its elastic resource allocation has become a fundamental and challenging problem with a fixed strategy that will result in waste of resources or a reduction in QoS (quality of service). Spark Streaming as an emerging system has been developed to process real time stream data analytics by using micro-batch approach. In this paper, first, we propose an improved SVR (support vector regression) based stream da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…3) TWRES: We employ a second dynamic scaling baseline inspired from recent related work. Precisely, we use the resource scaling algorithm (TWRES) proposed in [10] for Spark streaming jobs. Similar to Phoebe, this algorithm requires profiling data, and scales a data processing application under consideration of workload forecasts, a performance model for the maximum processing capacity of individual scaleouts, and formulated latency constraints.…”
Section: B Phoebe Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…3) TWRES: We employ a second dynamic scaling baseline inspired from recent related work. Precisely, we use the resource scaling algorithm (TWRES) proposed in [10] for Spark streaming jobs. Similar to Phoebe, this algorithm requires profiling data, and scales a data processing application under consideration of workload forecasts, a performance model for the maximum processing capacity of individual scaleouts, and formulated latency constraints.…”
Section: B Phoebe Setupmentioning
confidence: 99%
“…1) Top Speed Windowing (TSW) Experiment: For the first experiment, a DSP job was selected from the official Flink repository 10 . It was modified so that sources consumed events from and sinks published results to separate Apache Kafka topics.…”
Section: Streaming Jobsmentioning
confidence: 99%
“…6) Avazu: This dataset is created by using a click-through rate prediction dataset from Kaggle 9 , aggregating the clicks per hour over time, and linearly interpolating between the aggregated values to obtain different sampling rates.…”
Section: Time Series Datasetsmentioning
confidence: 99%
“…However, none has been directly compared under our defined requirements for performing TSF in DSP systems, such as minimal configuration and limited model inputs. In the context of DSP, TSF methods have been used in diverse forms and for varying reasons [9], [35]- [38]. While previous works successfully apply a selected method to a concrete problem, to the best of our knowledge, there is no related work that compares multiple TSF methods for DSP.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation