Many runtime metrics can be collected from modern software systems. Stable statistical relationships exist among these metrics. Deviation from these stable relationships indicates potential problems, allowing diagnosis of failures. There exist many modeling techniques to represent these relationships. However, which one to use is a question that has yet to be studied.In this paper we compare the use of simple linear regression (SLR) to some of its more complex variants, including autoregressive regression and locally weighted regression. We consider the component coverage, model robustness, accuracy of diagnosis, and computation cost. Our study finds that while more flexible models can improve diagnosis accuracy, they achieve it at the cost of reduced robustness. In particular, we found the autoregressive regression model with exogenous input (ARX) to provide the most accurate diagnosis; however, it is the least robust of the techniques considered and the second most expensive. This study also finds that smoothing and other data transformations can noticeably improve results of SLR, thus providing an efficient alternative to ARX.
Enterprise software systems (ESS) are becoming larger and increasingly complex. Failure in business-critical systems is expensive, leading to consequences such as loss of critical data, loss of sales, customer dissatisfaction, even law suits. Therefore, detecting failures and diagnosing their root-cause in a timely manner is essential. Many studies suggest that a large fraction of failures encountered in practice are recurrent (i.e., they have been seen before). Fast and accurate detection of these failures can accelerate problem determination, and thereby improve system reliability. To this effect, we explore machine learning techniques, including the Naïve Bayes classifier, partially-supervised learning, and decision trees (using C4.5), to automatically recognize symptoms of recurrent faults and to derive detection rules from samples of log data. This work focuses on log files, since they are readily available and they do not put any additional computational burden on the component generating the data.The methods explored in this work can aid the development of tools to assist support personnel in problem determination tasks. Instead of requiring the operators to manually define patterns for identifying recurrent problems, such tools can be trained using prior, solved and unsolved cases from exist-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.