Abstract. Web prefetching techniques have pointed to be especially important to reduce web latencies and, consequently, an important set of works can be found in the open literature. But, in general, it is not possible to do a fair comparison among the proposed prefetching techniques due to three main reasons: i) the underlying baseline system where prefetching is applied differs widely among the studies; ii) the workload used in the presented experiments is not the same; iii) different performance key metrics are used to evaluate their benefits.This paper focuses on the third reason. Our main concern is to identify which the main meaningful indexes are when studying the performance of different prefetching techniques. For this purpose, we propose a taxonomy based in three categories, which permits us to identify analogies and differences among the indexes commonly used. In order to check, in a more formal way, the relation between them, we run experiments and estimate statistically the correlation among a representative subset of those metrics. The statistical results help us to suggest which indexes should be selected when performing evaluation studies depending on the different elements in the considered web architecture.The choice of the appropriate key metric is of paramount importance for a correct and representative study. As our experimental results show, depending on the metric used to check the system performance, results can not only widely vary but also reach opposite conclusions.