Capturing, indexing, clustering, and retrieving system history

Cohen, Ira L.; Zhang, Steve; Goldszmidt, Moisés; Symons, Julie; Kelly, Terence; Fox, Armando

doi:10.1145/1095809.1095821

Cited by 94 publications

(59 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many research ideas in production system monitoring may be applicable for load testing analysis. For example, approaches (e.g., [20], [193], [194], [195]) have been proposed to build performance signatures based on the past failures, so that whenever such symptoms occur in the field, the problems can be detected and notified right away. Analogously, we can formulate our performance signature based on mining the past load testing history and use these performance signatures to detect recurrent problems in load tests.…”

Section: Summary and Open Problemsmentioning

confidence: 99%

A Survey on Load Testing of Large-Scale Software Systems

Jiang

Hassan²

2015

IIEEE Trans. Software Eng.

144

View full text Add to dashboard Cite

Many large-scale software systems must service thousands or millions of concurrent requests. These systems must be load tested to ensure that they can function correctly under load (i.e., the rate of the incoming requests). In this paper, we survey the state of load testing research and practice. We compare and contrast current techniques that are used in the three phases of a load test: (1) designing a proper load, (2) executing a load test, and (3) analyzing the results of a load test. This survey will be useful for load testing practitioners and software engineering researchers with interest in the load testing of large-scale software systems.

show abstract

Section: Summary and Open Problemsmentioning

confidence: 99%

A Survey on Load Testing of Large-Scale Software Systems

Jiang

Hassan²

2015

IIEEE Trans. Software Eng.

144

View full text Add to dashboard Cite

show abstract

“…Xu et al [5] attempt to identify problems with production logs of distributed systems, and suggest methodologies to enhance the performance of mining the logs by automatic matching of log statements. Cohen et al [12] describe how failure prediction models are built to identify and study the root-causes of failures. They propose techniques to categorize the faulty execution results found in the logs, before building failure prediction models based on them.…”

Section: B Online Service Failure Predictionmentioning

confidence: 99%

“…We do not implement these predictors, we use commercially available software that implement them. We choose these algorithms because of their wide use [9], [12], [5], [14] and because they represent different approaches to data mining and machine learning; they represent Artificial Intelligence, Clustering Analysis, Statistical Methods, and Decision Rules respectively [2], [3], [22].…”

Section: Performance Metricsmentioning

confidence: 99%

Real-time failure prediction in online services

Shatnawi

Hefeeda

2015

2015 IEEE Conference on Computer Communications (INFOCOM)

View full text Add to dashboard Cite

Current data mining techniques used to create failure predictors for online services require massive amounts of data to build, train, and test the predictors. These operations are tedious, time consuming, and are not done in real-time. Also, the accuracy of the resulting predictor is highly compromised by changes that affect the environment and working conditions of the predictor. We propose a new approach to creating a dynamic failure predictor for online services in real-time and keeping its accuracy high during the services run-time changes. We use synthetic transactions during the run-time lifecycle to generate current data about the service. This data is used in its ephemeral state to build, train, test, and maintain an up-to-date failure predictor. We implemented the proposed approach in a largescale online ad service that processes billions of requests each month in six data centers distributed in three continents. We show that the proposed predictor is able to maintain failure prediction accuracy as high as 86% during online service changes, whereas the accuracy of the state-of-the-art predictors may drop to less than 10%.

show abstract

“…All of these projects compute dependencies, and therefore cannot deal well with missing dependency infor mation or resource contention. Much of this dependency modeling work requires that the system be actively perturbed by instrumentation or by probing [5,6,9,10,19]. Unfortunately, for many important systems, no such modifications are possible (for reasons of performance, administration, or cost).…”

Section: Related Workmentioning

confidence: 99%

Using correlated surprise to infer shared influence

Oliner

Kulkarni

Aiken

2010

2010 IEEE/IFIP International Conference on Dependable Systems &Amp; Networks (DSN)

View full text Add to dashboard Cite

We propose a method for identifying the sources of prob lems in complex production systems where, due to the pro hibitive costs of instrumentation, the data available for analysis may be noisy or incomplete. In particular, we may not have complete knowledge of all components and their interactions. We define influences as a class of component interactions that includes direct communication and re source contention. Our method infers the influences among components in a system by looking for pairs of components with time-correlated anomalous behavior. We summarize the strength and directionality of shared influences using a Structure-of-Influence Craph (SIC). This paper explains how to construct a SIC and use it to isolate system mis behavior, and presents both simulations and in-depth case studies with two autonomous vehicles and a 9024-node pro duction supercomputer.

show abstract

Capturing, indexing, clustering, and retrieving system history

Cited by 94 publications

References 12 publications

A Survey on Load Testing of Large-Scale Software Systems

A Survey on Load Testing of Large-Scale Software Systems

Real-time failure prediction in online services

Using correlated surprise to infer shared influence

Contact Info

Product

Resources

About