Proceedings of the 2016 International Conference on Management of Data 2016
DOI: 10.1145/2882903.2904441
|View full text |Cite
|
Sign up to set email alerts
|

Realtime Data Processing at Facebook

Abstract: Realtime data processing powers many use cases at Facebook, including realtime reporting of the aggregated, anonymized voice of Facebook users, analytics for mobile applications, and insights for Facebook page administrators. Many companies have developed their own systems; we have a realtime data processing ecosystem at Facebook that handles hundreds of Gigabytes per second across hundreds of data pipelines.Many decisions must be made while designing a realtime stream processing system. In this paper, we iden… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
50
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 94 publications
(50 citation statements)
references
References 19 publications
0
50
0
Order By: Relevance
“…It provides adaptors for commonly accessed data sources such as MySQL, S3, Kafka, and Salesforce. Other similar systems include Scribe [20], a messaging system as Facebook; Siphon [21], a messaging system in Microsoft Azure HDInsight that utilize Kafka. Marcu et al [12] developed KerA, a data ingestion framework that alleviate the limitations of Kafka and other ingestion systems.…”
Section: Dataflow Output and Performance Improvementmentioning
confidence: 99%
“…It provides adaptors for commonly accessed data sources such as MySQL, S3, Kafka, and Salesforce. Other similar systems include Scribe [20], a messaging system as Facebook; Siphon [21], a messaging system in Microsoft Azure HDInsight that utilize Kafka. Marcu et al [12] developed KerA, a data ingestion framework that alleviate the limitations of Kafka and other ingestion systems.…”
Section: Dataflow Output and Performance Improvementmentioning
confidence: 99%
“…Streaming data represents a continuous flow of data that needs to be processed by systems equipped to ingest, process, store and analyze the data. Existing data stream processing systems (DSPS) [18][19][20][21] typically include the following main components: (i) streaming data sources, (ii) data ingestion systems, (iii) data stream processing engines (DSPE), (iv) storage systems, (v) resource management services, and (vi) data sink to channel the output to other DSPSs, storage or visualization tools. Fig.…”
Section: A Streaming Data Proccessing Systemsmentioning
confidence: 99%
“…The DSPS at Facebook [21] powers many use cases such as the real-time reporting of the aggregated voice of Facebook users, analytics for mobile applications, and insights for Facebook page administrators. It is made up of data sources such as mobile and web products; Scribe as a data distribution tool; stream processing systems such as Puma, Stylus, and Swift; and data stores such as Laser, Scuba, and Hive.…”
Section: B Dspsmentioning
confidence: 99%
“…Distributed stream processing systems like Storm [7] and Heron [8] are frequently used for real-time analysis, online machine learning and continuous computing. As also identified by researchers at Facebook [9], the ease of use, performance, fault tolerance, scalability and correctness are five important design decisions for real-time data stream processing systems. Two architectural styles for real-time data processing are the Lambda architecture (http://lambda-architecture.net, by Nathan Marz, creator of Storm) [10] and the Kappa architecture (http://kappa-architecture.com, by Jay Kreps, from LinkedIn).…”
Section: Introductionmentioning
confidence: 99%