2016 IEEE International Conference on Big Data (Big Data) 2016
DOI: 10.1109/bigdata.2016.7840603
|View full text |Cite
|
Sign up to set email alerts
|

Thrill: High-performance algorithmic distributed batch data processing with C++

Abstract: We present the design and a first performance evaluation of Thrill -a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cachefriendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
39
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 37 publications
(40 citation statements)
references
References 17 publications
1
39
0
Order By: Relevance
“…2). The use of asynchronous input/ output (IO) has been extensively studied for problems involving big data applications, particularly on distributed systems 23 such as supercomputers and clusters. Performance tuning in such cases involves selecting a number of parameters that are highly system dependent, particularly for heterogeneous computers.…”
Section: Methodsmentioning
confidence: 99%
“…2). The use of asynchronous input/ output (IO) has been extensively studied for problems involving big data applications, particularly on distributed systems 23 such as supercomputers and clusters. Performance tuning in such cases involves selecting a number of parameters that are highly system dependent, particularly for heterogeneous computers.…”
Section: Methodsmentioning
confidence: 99%
“…We explored using the Thrill [26] library to track the most energetic particles for the results of VPIC plasma physics simulation [32]. Thrill is a research project that aims to provide a bridge between big data analytics and HPC platforms.…”
Section: Solution Approachmentioning
confidence: 99%
“…Another important step was to build a prototype of a tool for implementing algorithms that process large data sets on distributed memory machines. The result, Thrill [7], is based on C++, offers a rich set of operations on distributed arrays such as map, reduce, sort, merge, and prefix-sum. It can fuse pipelines of local operations into tight loops optimized at compile time, considerably outperforming established tools such as Spark or Flink.…”
Section: P16 Massive Text Indices J Fischer (Tu Dortmund) and P Sanmentioning
confidence: 99%
“…: Broccoli [4] for semantic search, GENO 3 for generic optimization code generation, NetworKit [28] for network analysis, STXXL [11] for external-memory computing, and Thrill [7] for distributed batch data processing. The priority programme also creates visibility by its national and international events (e.g., Summer/Winter schools in Chennai 2016 and Tel Aviv 2017).…”
Section: Scientific Output and Spp Collaborationsmentioning
confidence: 99%