Using black-box performance models to detect performance regressions under varying workloads: an empirical study

Liao, Li-Zhi; Chen, Jinfu; Li, Heng; Zeng, Yi; Shang, Weiyi; Guo, Jianmei; Sporea, Catalin; Toma, Andrei; Sajedi, Sarah

doi:10.1007/s10664-020-09866-z

Cited by 17 publications

(11 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, for M o , we cannot directly calculate the prediction error, since applying a model to its training data leads to biased (overly optimized) results. To address this issue, we apply the throw-one approach that is used in prior research (Liao et al, 2020). For each time period in the original set of workloads, we remove its data from the training data to rebuild the model and apply the rebuilt model on the time period.…”

Section: Resultsmentioning

confidence: 99%

Reducing the Length of Field-Replay Based Load Testing

Xia,

Liao,

Chen

et al. 2024

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Reducing the Length of Field-replay Based Load Testing Yuanjie Xia With the development of software, load testing have become more and more important. Load testing can ensure the software system can provide quality service under a certain load. Therefore, one of the common challenges of load testing is to design realistic workloads that can represent the actual workload in the field. In particular, one of the most widely adopted and intuitive approaches is to directly replay the field workloads in the load testing environment, which is resourceand time-consuming. In this work, we propose an automated approach to reduce the length of load testing that is driven by replaying the field workloads. The intuition of our approach is: if the measured performance associated with a particular system behaviour is already stable, we can skip subsequent testing of this system behaviour to reduce the length of the field workloads. In particular, our approach first clusters execution logs that are generated during the system runtime to identify similar system behaviours during the field workloads. Then, we use statistical methods to determine whether the measured performance associated with a system behaviour has been stable. We evaluate our approach on three open-source projects (i.e., OpenMRS, TeaStore, and Apache James). The results show that our approach can significantly reduce the length of field workloads while the workloads-after-reduction produced by our approach are representative of the original set of workloads. More importantly, the load testing results obtained by replaying the workloads after the reduction have high correlation and similar trend with the original set of workloads. Practitioners can leverage our approach to perform realistic field-replay based load testing while saving the needed resources and time.

show abstract

Section: Resultsmentioning

confidence: 99%

Reducing the Length of Field-Replay Based Load Testing

Xia,

Liao,

Chen

et al. 2024

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“…Thus, we also use Cliff's delta to quantify the magnitude of the differences (a.k.a., effect sizes). Cliff's delta measures the effect size statistically and has been used in prior engineering studies (Kitchenham et al, 2002;Li, Chen, Shang, & Hassan, 2018;Liao et al, 2020).…”

Section: Statistical Analyses On Performance Evaluation Resultsmentioning

confidence: 99%

“…Logistic regression is a statistical model that uses a logit function to model a binary variable (the target variable) as a linear combination of the independent variables Hosmer Jr, Lemeshow, and Sturdivant (2013), which is widely used in software analytics (Shang et al, 2015;Tantithamthavorn et al, 2018). XGBoost is an efficient and accurate implementation of the gradient boosting algorithm, which is reported to perform better than other machine learning models in software engineering applications Liao et al (2020). The neural network model Glorot, Bordes, and Bengio (2011) used in our study consists of four layers and are trained with 100 batch size, and 10 epochs.…”

Section: Splited Configuration Option Namesmentioning

confidence: 99%

Performance regression detection in DevOps

Chen

2020

Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings

Self Cite

View full text Add to dashboard Cite

Performance is an important aspect of software quality. The goals of performance are typically defined by setting upper and lower bounds for response time and throughput of a system and physical level measurements such as CPU, memory, and I/O. To meet such performance goals, several performance-related activities are needed in development (Dev) and operations (Ops). Large software system failures are often due to performance issues rather than functional bugs. One of the most important performance issues is performance regression. Although performance regressions are not all bugs, they often have a direct impact on users' experience of the system. The process of detection of performance regressions in development and operations is faced with challenges. First, the detection of performance regression is conducted after the fact, i.e., after the system is built and deployed in the field or dedicated performance testing environments. Large amounts of resources are required to detect, locate, understand, and fix performance regressions at such a late stage in the development cycle. Second, even we can detect a performance regression, it is extremely hard to fix it because other changes are applied to the system after the introduction of the regression.These challenges call for further in-depth analyses of the performance regression. In this thesis, to avoid performance regression slipping into operation, we first perform an exploratory study on the source code changes that introduce performance regressions in order to understand root-causes of performance regression in the source code level. Second, we propose an approach that automatically predicts whether a test would manifest performance regressions in a code commit.Most of the performance issues are related to configurations. Therefore, third, we propose an iii approach that predicts whether a configuration option manifests a performance variation issue. To assist practitioners to analyze system performance with operational data, we propose an approach to recovering field-representative workload that can be used to detect performance regression.ivContents List of Figures xvi List of Tables xviii mance counters. Improvement are calculated by comparing with a random classifier.

show abstract

“…This complementary analysis is helpful to (i) demonstrate that DeLag is able to detect patterns correlated with latency deviations even on complex workloads, and (ii) to give a better idea to the reader about the capabilities of DeLag in supporting the analysis of specific latency behaviors. Similarly to recent studies [57], [58], we use load mixtures that involve multiple types of simulated users (i.e., load drivers), where each user type performs different classes of requests on the system. For example, in the Train Ticket case of study, some types of user may only visit the homepage and subsequently search trains for some random locations, while others may first login and then book random tickets.…”

Section: Methodsmentioning

confidence: 99%

DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-Based Systems

Traini

Cortellessa

2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Performance debugging in production is a fundamental activity in modern service-based systems. The diagnosis of performance issues is often time-consuming, since it requires thorough inspection of large volumes of traces and performance indices. In this paper we present DeLag, a novel automated search-based approach for diagnosing performance issues in service-based systems. DeLag identifies subsets of requests that show, in the combination of their Remote Procedure Call execution times, symptoms of potentially relevant performance issues. We call such symptoms Latency Degradation Patterns. DeLag simultaneously searches for multiple latency degradation patterns while optimizing precision, recall and latency dissimilarity. Experimentation on 700 datasets of requests generated from two microservice-based systems shows that our approach provides better and more stable effectiveness than three state-of-the-art approaches and general purpose machine learning clustering algorithms. DeLag is more effective than all baseline techniques in at least one case study (with p ≤ 0.05 and non-negligible effect size). Moreover, DeLag outperforms in terms of efficiency the second and the third most effective baseline techniques on the largest datasets used in our evaluation (up to 22%).

show abstract

Using black-box performance models to detect performance regressions under varying workloads: an empirical study

Cited by 17 publications

References 43 publications

Reducing the Length of Field-Replay Based Load Testing

Reducing the Length of Field-Replay Based Load Testing

Performance regression detection in DevOps

DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-Based Systems

Contact Info

Product

Resources

About