Thrill: High-performance algorithmic distributed batch data processing with C++

Bingmann, Timo; Axtmann, Michael; Jöbstl, Emanuel; Lamm, Sebastian; Nguyen, Huyen Chau; Noe, Alexander; Schlag, Sebastian; Stumpp, Matthias; Sturm, Tobias; Sanders, Peter

doi:10.1109/bigdata.2016.7840603

Cited by 37 publications

(40 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2). The use of asynchronous input/ output (IO) has been extensively studied for problems involving big data applications, particularly on distributed systems 23 such as supercomputers and clusters. Performance tuning in such cases involves selecting a number of parameters that are highly system dependent, particularly for heterogeneous computers.…”

Section: Methodsmentioning

confidence: 99%

SIproc: an open-source biomedical data processing platform for large hyperspectral images

Berisha

Chang

Saki

et al. 2017

Analyst

View full text Add to dashboard Cite

There has recently been significant interest within the vibrational spectroscopy community to apply quantitative spectroscopic imaging techniques to histology and clinical diagnosis. However, many of the proposed methods require collecting spectroscopic images that have a similar region size and resolution to corresponding histological images. Since spectroscopic images contain significantly more spectral samples than traditional histology, the resulting data sets can approach hundreds of gigabytes to terabytes in size. This makes them difficult to store and process, and the tools available to researchers for handling large spectroscopic data sets are limited. Fundamental mathematical tools, such as MATLAB, Octave, and SciPy, are extremely powerful but require data to be stored in a fraction of the available system memory. These memory limitations become impractical for even modestly sizes histological images, which can be hundreds of gigabytes in size. In this paper, we propose an open-source toolkit designed to perform out-of-core processing of hyperspectral images. By taking advantage of graphical processing unit (GPU) computing combined with adaptive data streaming, our software alleviates common workstation memory limitations while achieving better performance than existing applications.

show abstract

Section: Methodsmentioning

confidence: 99%

SIproc: an open-source biomedical data processing platform for large hyperspectral images

Berisha

Chang

Saki

et al. 2017

Analyst

View full text Add to dashboard Cite

show abstract

“…We explored using the Thrill [26] library to track the most energetic particles for the results of VPIC plasma physics simulation [32]. Thrill is a research project that aims to provide a bridge between big data analytics and HPC platforms.…”

Section: Solution Approachmentioning

confidence: 99%

The ISTI Rapid Response on Exploring Cloud Computing 2018

Coffrin

Arnold

Eidenbenz

et al. 2018

View full text Add to dashboard Cite

CloudFront: CloudFront provides a fast and secure content delivery service for web-hosting. Cloud-Front simplifies the process of delivering content with low latency and high bandwidth across the globe and provides basic threat mitigation tools, for example to protect the web service from DDoS attacks.Route 53: Route 53 is a reliable and scaleable DNS service that makes it easy to route users to web applications hosted at specific IP addresses.

show abstract

“…Another important step was to build a prototype of a tool for implementing algorithms that process large data sets on distributed memory machines. The result, Thrill [7], is based on C++, offers a rich set of operations on distributed arrays such as map, reduce, sort, merge, and prefix-sum. It can fuse pipelines of local operations into tight loops optimized at compile time, considerably outperforming established tools such as Spark or Flink.…”

Section: P16 Massive Text Indices J Fischer (Tu Dortmund) and P Sanmentioning

confidence: 99%

“…: Broccoli [4] for semantic search, GENO 3 for generic optimization code generation, NetworKit [28] for network analysis, STXXL [11] for external-memory computing, and Thrill [7] for distributed batch data processing. The priority programme also creates visibility by its national and international events (e.g., Summer/Winter schools in Chennai 2016 and Tel Aviv 2017).…”

Section: Scientific Output and Spp Collaborationsmentioning

confidence: 99%

DFG Priority Programme SPP 1736: Algorithms for Big Data

Behdju

Meyer

2017

Künstl Intell

View full text Add to dashboard Cite

Volume, Velocity, and Variety are the three Vs commonly used to define the term big data. Simply put, those refer to the increasing amount of new data created, the increasing rate at which it is created, and the increasing number of different formats it has. At the same time, the three Vs describe challenges that require new algorithmic approaches. In order to tackle those challenges, the German Research Foundation established in 2013 the priority programme SPP 1736: Algorithms for Big Data. In this article we give a short overview on the research topics represented within this priority programme.

show abstract

Thrill: High-performance algorithmic distributed batch data processing with C++

Cited by 37 publications

References 17 publications

SIproc: an open-source biomedical data processing platform for large hyperspectral images

SIproc: an open-source biomedical data processing platform for large hyperspectral images

The ISTI Rapid Response on Exploring Cloud Computing 2018

DFG Priority Programme SPP 1736: Algorithms for Big Data

Contact Info

Product

Resources

About