2010
DOI: 10.14778/1920841.1921056
|View full text |Cite
|
Sign up to set email alerts
|

Massively parallel data analysis with PACTs on Nephele

Abstract: Large-scale data analysis applications require processing and analyzing of Terabytes or even Petabytes of data, particularly in the areas of web analysis or scientific data management. This trend has been discussed as "web-scale data management" in a panel at VLDB 2009. Formerly, parallel data processing was the domain of parallel database systems. Today, novel requirements like scaling out to thousands of machines, improved fault-tolerance, and schema free processing have made a case for new approaches.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 42 publications
(20 citation statements)
references
References 5 publications
0
20
0
Order By: Relevance
“…The disruptions discussed in this study present exciting opportunities for the business intelligence community. We started to address hard problems, the solution of which can greatly impact the future course of data platforms and tools such as identifying text mining services for niche data [7], or investigating data processing infrastructures for large scale data such as [1].…”
Section: Resultsmentioning
confidence: 99%
“…The disruptions discussed in this study present exciting opportunities for the business intelligence community. We started to address hard problems, the solution of which can greatly impact the future course of data platforms and tools such as identifying text mining services for niche data [7], or investigating data processing infrastructures for large scale data such as [1].…”
Section: Resultsmentioning
confidence: 99%
“…The plan with the lowest cost is selected as an optimized query plan. A PACT program is executed on a three-tier architecture [38] composed of: a PACT compiler, engine Nephele [39], and a distributed file system.…”
Section: Parallelism In Traditional Data Flowmentioning
confidence: 99%
“…These limitations led to the development of workflow engines targeted at largescale data analytics [7,4,23]. These engines allow for more flexible compositions of user-defined code than map-reduce while keeping the ability to parallelize tasks.…”
Section: Workflow Enginesmentioning
confidence: 99%
“…In this paper we extend these ideas to support both relational operators and UDFs. A similar approach is PACT [4] where contracts are bound to tasks as pre-and post-conditions. These contracts enable rewrites of the workflow.…”
Section: Workflow Enginesmentioning
confidence: 99%
See 1 more Smart Citation