2012
DOI: 10.21236/ada575859
|View full text |Cite
|
Sign up to set email alerts
|

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing

Abstract: Many "big data" applications need to act on data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model, discretized streams (D-Streams), that offers a high-level functional API, s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
373
0
14

Year Published

2015
2015
2018
2018

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 306 publications
(389 citation statements)
references
References 25 publications
2
373
0
14
Order By: Relevance
“…Although, streaming regression algorithms (e.g. Spark Streaming [19]) based on micro batch analysis [20] can provide faster solution but these algorithms do it at the expense of less accuracy. In our earlier work [21], we proposed a solution highlighting these drawbacks and presented initial results.…”
Section: B Motivationmentioning
confidence: 99%
“…Although, streaming regression algorithms (e.g. Spark Streaming [19]) based on micro batch analysis [20] can provide faster solution but these algorithms do it at the expense of less accuracy. In our earlier work [21], we proposed a solution highlighting these drawbacks and presented initial results.…”
Section: B Motivationmentioning
confidence: 99%
“…Spark introduces in memory partitions and computing, thereby reducing frequent hard disk reads and writes, which improves the response time which is the key characteristic of stream computing. Discretized Streams [11]are tuples of Resilient Distributed Data Sets(RDDs) [10], whichprocess streams as short, deterministic tasks which are also stateless. RDDs reconstruct themselves through lineage information, thereby achieving fault tolerance [12].…”
Section: A Streaming Data Analyticsmentioning
confidence: 99%
“…Discretized Streams [11]are tuples of Resilient Distributed Data Sets(RDDs) [10], whichprocess streams as short, deterministic tasks which are also stateless. RDDs reconstruct themselves through lineage information, thereby achieving fault tolerance [12].…”
Section: A Streaming Data Analyticsmentioning
confidence: 99%
“…Obviously, more efficient algorithms are required, and thus, the RkNN problem has been studied extensively in the past years for centralized environments [16]. But, with the fast increase in the scale of the big input datasets, parallel and distributed algorithms for RkNNQ in MapReduce [2] have been designed and implemented [6,7], and there are no RkNNQ implementations in Spark [17].…”
Section: Introductionmentioning
confidence: 99%