Trace-driven cache simulation is central to computer design. A trace is a very long sequence, zi, .... ZNj, of references to lines (contiguous locations) from main memory. At the t t h instant, reference z is hashed into a set of cache locations, the contents of which are then compared with Zt. If at the tth instant xt is not present in the cache, then it is said to be a miil and is loaded into the cache set, possibly forcing the replacement of some other memory line, and mak,.ig z, present for the (t + 1)" instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set Is considered, with the aim of determining which references are misses and related statistics.A simulation method is presented for the Least-Recently-Used (LRU) policy, which regardless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. We present timings of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference-baaed line replacement policies are considered, which includes LRU as well as the Least-Frequently-Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in time O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well-suited for SIMD implementation.