Cloud computing promises high scalability, flexibility and cost-effectiveness to satisfy emerging computing requirements. To efficiently provision computing resources in the cloud, system administrators need the capabilities of characterizing and predicting workload on the Virtual Machines (VMs). In this paper, we use data traces obtained from a real data center to develop such capabilities. First, we search for repeatable workload patterns by exploring cross-VM workload correlations resulted from the dependencies among applications running on different VMs. Treating workload data samples as time series, we develop a co-clustering technique to identify groups of VMs that frequently exhibit correlated workload patterns, and also the time periods in which these VM groups are active. Then, we introduce a method based on Hidden Markov Modeling (HMM) to characterize the temporal correlations in the discovered VM clusters and to predict variations of workload patterns. The experimental results show that our method can not only help better understand group-level workload characteristics, but also make more accurate predictions on workload changes in a cloud.
IT problem management calls for quick identification of resolvers to reported problems. The efficiency of this process highly depends on ticket routing-transferring problem ticket among various expert groups in search of the right resolver to the ticket. To achieve efficient ticket routing, wise decision needs to be made at each step of ticket transfer to determine which expert group is likely to be, or to lead to the resolver.In this paper, we address the possibility of improving ticket routing efficiency by mining ticket resolution sequences alone, without accessing ticket content. To demonstrate this possibility, a Markov model is developed to statistically capture the right decisions that have been made toward problem resolution, where the order of the Markov model is carefully chosen according to the conditional entropy obtained from ticket data. We also design a search algorithm, called Variable-order Multiple active State search (VMS), that generates ticket transfer recommendations based on our model. The proposed framework is evaluated on a large set of realworld problem tickets. The results demonstrate that VMS significantly improves human decisions: Problem resolvers can often be identified with fewer ticket transfers.
Motivated by the enormous amounts of data collected in a large IT service provider organization, this paper presents a method for quickly and automatically summarizing and extracting meaningful insights from the data. Termed Clustered Subset Selection (CSS), our method enables programguided data explorations of high-dimensional data matrices. CSS combines clustering and subset selection into a coherent and intuitive method for data analysis. In addition to a general framework, we introduce a family of CSS algorithms with different clustering components such as k-means and Close-to-Rank-One (CRO) clustering, and Subset Selection components such as best rank-one approximation and RankRevealing QR (RRQR) decomposition.From an empirical perspective, we illustrate that CSS is achieving significant improvements over existing Subset Selection methods in terms of approximation errors. Compared to existing Subset Selection techniques, CSS is also able to provide additional insight about clusters and cluster representatives. Finally, we present a case-study of programguided data explorations using CSS on a large amount of IT service delivery data collection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.