Duet Benchmarking: Improving Measurement Accuracy in the Cloud

Bulej, Lubomír; Horký, Vojtěch; Tůma, Petr; Farquet, François; Prokopec, Aleksandar

doi:10.1145/3358960.3379132

Cited by 16 publications

(6 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To mitigate measurement bias, Georges et al [18] outlined a rigorous methodology how to asses performance of Java programs, which we base our measurement technique on. Using the correct statistical techniques to assess performance is paramount, with estimated confidence intervals using bootstrap being the state-of-the-art [8,9,27,33]. One of our stopping criteria is based on and our result quality evaluation uses confidence intervals with bootstrap.…”

Section: Related Workmentioning

confidence: 99%

Dynamically reconfiguring software microbenchmarks: reducing execution time without sacrificing result quality

Laaber

Würsten

Gall

et al. 2020

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Dynamically reconfiguring software microbenchmarks: reducing execution time without sacrificing result quality

Laaber

Würsten

Gall

et al. 2020

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

View full text Add to dashboard Cite

“…Related work quantifying the variability of short running performance experiments in the cloud can, for example, be found by Iosup et al (2011), Leitner and Cito (2016), Abedi and Brecht (2017), Maricq et al (2018), or Laaber et al (2019). He et al (2019) and He et al (2021) and Bulej et al (2020) propose methods for reducing the amount of experiment repetitions while preserving a high measurement accuracy. These methods differ from ours in that they aim to accurately measure the performance of a system, while for benchmarking scalability, we only need to accurately assess whether a system fulfills specified SLOs.…”

Section: Scalability Measurement Methodsmentioning

confidence: 99%

A configurable method for benchmarking scalability of cloud-native applications

Henning

Hasselbring

2022

Empir Software Eng

View full text Add to dashboard Cite

Cloud-native applications constitute a recent trend for designing large-scale software systems. However, even though several cloud-native tools and patterns have emerged to support scalability, there is no commonly accepted method to empirically benchmark their scalability. In this study, we present a benchmarking method, allowing researchers and practitioners to conduct empirical scalability evaluations of cloud-native applications, frameworks, and deployment options. Our benchmarking method consists of scalability metrics, measurement methods, and an architecture for a scalability benchmarking tool, particularly suited for cloud-native applications. Following fundamental scalability definitions and established benchmarking best practices, we propose to quantify scalability by performing isolated experiments for different load and resource combinations, which asses whether specified service level objectives (SLOs) are achieved. To balance usability and reproducibility, our benchmarking method provides configuration options, controlling the trade-off between overall execution time and statistical grounding. We perform an extensive experimental evaluation of our method’s configuration options for the special case of event-driven microservices. For this purpose, we use benchmark implementations of the two stream processing frameworks Kafka Streams and Flink and run our experiments in two public clouds and one private cloud. We find that, independent of the cloud platform, it only takes a few repetitions (≤ 5) and short execution times (≤ 5 minutes) to assess whether SLOs are achieved. Combined with our findings from evaluating different search strategies, we conclude that our method allows to benchmark scalability in reasonable time.

show abstract

“…Cloud VMs often have different performance characteristics and are subject to random fluctuations, even comparing two VMs of the same instance type [30]. Since we are only interested in a relative comparison of two SUT options and do not need absolute values, we can (largely) remove the noise resulting from cloud performance variability using duet benchmarking [6,7]. This is achieved, by running two (or even more) different SUT options, in our case different versions, and their application benchmarks on the same cloud VM simultaneously, with 50% of the resources assigned to each SUT option and benchmark.…”

Section: Application Benchmarksmentioning

confidence: 99%

The Early Microbenchmark Catches the Bug -- Studying Performance Issues Using Micro- and Application Benchmarks

Japke,

Witzko,

Grambow

et al. 2023

Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing

View full text Add to dashboard Cite

An application's performance regressions can be detected by both application or microbenchmarks. While application benchmarks stress the system under test by sending synthetic but realistic requests which, e.g., simulate real user traffic, microbenchmarks evaluate the performance on a subroutine level by calling the function under test repeatedly.In this paper, we use a testbed microservice application which includes three performance issues to study the detection capabilities of both approaches. In extensive benchmarking experiments, we increase the severity of each performance issue stepwise, run both an application benchmark and the microbenchmark suite, and check at which point each benchmark detects the performance issue. Our results show that microbenchmarks detect all three issues earlier, some even at the lowest severity level. Application benchmarks, however, raised false positive alarms, wrongly detected performance improvements, and detected the performance issues later.

show abstract

Duet Benchmarking: Improving Measurement Accuracy in the Cloud

Cited by 16 publications

References 19 publications

Dynamically reconfiguring software microbenchmarks: reducing execution time without sacrificing result quality

Dynamically reconfiguring software microbenchmarks: reducing execution time without sacrificing result quality

A configurable method for benchmarking scalability of cloud-native applications

The Early Microbenchmark Catches the Bug -- Studying Performance Issues Using Micro- and Application Benchmarks

Contact Info

Product

Resources

About