“…The number of forward and backward pointers is proportional to the number of cache lines in the machine, which is much smaller than the number of memory lines. Although some optimizations to the initial proposal have been studied (for example, in [12] and [13]) and several commercial machines have been implemented using this kind of protocol, such as the Sequent NUMA-Q [14] and Convex Exemplar [15] multiprocessors, the important drawbacks these protocols entail have decreased their popularity, and from the SGI Origin 2000 [6] onward, most designs use memory-based directory protocols, such as Piranha [16], the AlphaServer GS320 [17], or the Cenju-4 [18]. Among others, these drawbacks include the increased latency of coherence transactions as well as occupancy of cache controllers, complex protocol implementations [5] and, what is more important, the need of larger cache states and extra bits for forward and backward pointers, which implies changing processor caches.…”