This article describes a method for analyzing, modeling, and simulating a two-level arrivalcounting process. This method is particularly appropriate when the number of independent processes is large, as is the case in our motivating application which requires analyzing and representing computer file system trace data for activity on nearly 8,000 files. The method is also applicable to network trace data characterizing communication patterns between pairs of computers. We apply cluster analysis to separate the arrival process into groups or bursts of activity on a file. We then characterize the arrival process in terms of the time between bursts of activity on a file, the time between file events within bursts, and the number of events in a burst. Finally, we model these three components individually, then reassemble the results to produce a synthetic trace generator. In order to gauge the effectiveness of this method, we use synthetically generated (simulated) trace data produced in this way to drive a discrete-event simulation of a distributed replicated file system. We compare the results of the simulation driven by the synthetic trace with the same simulation driven by the original trace data, and conclude that the synthetic data capture the essential characteristics of the empirical trace.
A method for analyzing, modeling and simulating a two-level arrival-counting process is presented. ThM method is particularly appropriate when the number of independent processes is large. The initial motivation for th~method was the need to analyze and represent computer file system trace data that involves activity on some 8,000 files. The method is also applicable to network trace data characterizing communication patterns between pairs of computers.Cluster analysis with a novel stopping rule is used to decompose the arrival process into groups.The resulting clusters can be characterized using the time between clusters, the time between arrivals within clusters, and the size of each cluster.Each of these three components is then analyzed as a univariate problem.The effectiveness of this method is measured by comparing the output of a simulation driven by the original trace data to the output of the same simulation driven by the input model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.