Steffen Zeuch scite author profile

Modern Stream Processing Engines (SPEs) process large data volumes under tight latency constraints. Many SPEs execute processing pipelines using message passing on shared-nothing architectures and apply a partition-based scale-out strategy to handle high-velocity input streams. Furthermore, many state-of-the-art SPEs rely on a Java Virtual Machine to achieve platform independence and speed up system development by abstracting from the underlying hardware. In this paper, we show that taking the underlying hardware into account is essential to exploit modern hardware efficiently. To this end, we conduct an extensive experimental analysis of current SPEs and SPE design alternatives optimized for modern hardware. Our analysis highlights potential bottlenecks and reveals that state-of-the-art SPEs are not capable of fully exploiting current and emerging hardware trends, such as multi-core processors and high-speed networks. Based on our analysis, we describe a set of design changes to the common architecture of SPEs to scale-up on modern hardware. We show that the single-node throughput can be increased by up to two orders of magnitude compared to state-of-the-art SPEs by applying specialized code generation, fusing operators, batch-style parallelization strategies, and optimized windowing. This speedup allows for deploying typical streaming applications on a single or a few nodes instead of large clusters.

show abstract

Generating custom code for efficient query execution on heterogeneous processors

Breβ

Köcher

Funke

et al. 2018

The VLDB Journal

View full text Add to dashboard Cite

Processor manufacturers build increasingly specialized processors to mitigate the effects of the power wall to deliver improved performance. Currently, database engines are manually optimized for each processor: A costly and error prone process.In this paper, we propose concepts to enable the database engine to perform per-processor optimization automatically. Our core idea is to create variants of generated code and to learn a fast variant for each processor. We create variants by modifying parallelization strategies, specializing data structures, and applying different code transformations.Our experimental results show that the performance of variants may diverge up to two orders of magnitude. Therefore, we need to generate custom code for each processor to achieve peak performance. We show that our approach finds a fast custom variant for multi-core CPUs, GPUs, and MICs.

show abstract

Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines

Monte

Zeuch

Rabl

et al. 2020

View full text Add to dashboard Cite

A survey of adaptive sampling and filtering algorithms for the internet of things

Giouroukis

Dadiani

Traub

et al. 2020

View full text Add to dashboard Cite

The Internet of Things (IoT) represents one of the fastest emerging trends in the area of information and communication technology. The main challenge in the IoT is the timely gathering of data streams from potentially millions of sensors. In particular, those sensors are widely distributed, constantly in transit, highly heterogeneous, and unreliable. To gather data in such a dynamic environment efficiently, two techniques have emerged over the last decade: adaptive sampling and adaptive filtering. These techniques dynamically reconfigure rates and filter thresholds to trade-off data quality against resource utilization. In this paper, we survey representative, state-of-the-art algorithms to address scalability challenges in real-time and distributed sensor systems. To this end, we cover publications from top peerreviewed venues for a period larger than 12 years. For each algorithm, we point out advantages, disadvantages, assumptions, and limitations. Furthermore, we outline current research challenges, future research directions, and aim to support readers in their decision process when designing extremely distributed sensor systems.

show abstract

Dynamic parameter allocation in parameter servers

et al. 2020

View full text Add to dashboard Cite

To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management---a key concern in distributed training---, but can induce severe communication overhead. To reduce communication overhead, distributed machine learning algorithms use techniques to increase parameter access locality (PAL), achieving up to linear speed-ups. We found that existing parameter servers provide only limited support for PAL techniques, however, and therefore prevent efficient training. In this paper, we explore whether and to what extent PAL techniques can be supported, and whether such support is beneficial. We propose to integrate dynamic parameter allocation into parameter servers, describe an efficient implementation of such a parameter server called Lapse, and experimentally compare its performance to existing parameter servers across a number of machine learning tasks. We found that Lapse provides near-linear scaling and can be orders of magnitude faster than existing parameter servers.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Steffen Zeuch

Analyzing efficient stream processing on modern hardware

Generating custom code for efficient query execution on heterogeneous processors

Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines

A survey of adaptive sampling and filtering algorithms for the internet of things

Dynamic parameter allocation in parameter servers

Contact Info

Product

Resources

About