In this paper we review the evaluation of relevance feedback methods for content-based image retrieval systems. We start out by presenting an overview of current common practice, and argue that the evaluation of relevance feedback methods differs from evaluating CBIR systems as a whole. Specifically, we identify the challenging issues that are particular to the evaluation of retrieval employing relevance feedback.Next, we propose three guidelines to move toward more effective evaluation benchmarks. We focus particularly on assessing feedback methods more directly in terms of their goal of identifying the relevant target class with a small number of samples, and show how to compensate for query targets of varying difficulty by measuring efficiency at generalization.