The performance cliaracteris tics of several classes of parallel computing systems are analyzed and compared using high-fidelity modeling and executiondriven simulation. Processor, bus and network models are used to construct and simulate the architectures of symmetric multiprocessors (SMPs), clusters of uniprocessors, and clusters of SMPs. To demonstrate a typical use, the performance of ten systems is evaluated using a parallel matrix-multiplication algorithm. Because the performance of a parallel algorithm on an architecture depends on its communication-to-computation ratio, an analysis of communication latencies for bus transactions, cache coherence, and network transactions is used to quantify each system's communication overhead. While low-level performance attributes are difficult to measure on experimental testbed systems, and are difficult to accurately represent in purely analytical models, with high fidelity simulative models they can be readily and accurately obtained. This level of detail allows the designer to rapidly prototype and evaluate the performance of parallel and distributed systems.