A recent study conducted by our research group revealed that there may be a significant "laboratory effect" in retrospective observer performance studies that may in turn limit the relevance of inferences generated as a result of these studies to the clinical environment (1). Even though we found a considerable consistency among the performance levels of the readers regardless of the specific method or rating scale used (i.e., binary, receiver operating characteristic (ROC) or free-response ROC (FROC)) (2,3), our study showed that radiologists may perform significantly different in the clinic than in the laboratory when reading the very same cases. The differences were reflected in terms of both their overall performance levels (e.g. sensitivity and specificity) and, perhaps more important, in terms of variability, or "spread", among the observers' performance levels (1). Although the results of our study should definitely be validated experimentally in other studies before general acceptance, there is a reasonably solid rationale for the observed outcome of our own study (1). Indeed radiologists may perform differently in the laboratory in ways that are not a-priori predictable; therefore, differences in their behavior can not always be completely accounted for during retrospective studies. Even attempting to duplicate seemingly simple conditions such as practice guidelines (e.g. aim for 10% recall rate) during the experiment ultimately may not reflect actual behavior in the clinic. Clinical decisions that affect patient management can not be duplicated in laboratory studies. Because differences in behavior are difficult to account for in retrospective observer performance studies we, the investigators, have to ask ourselves what next? How do we proceed with appropriate evaluations of new technologies and practices in a manner that is both practical and, at the same time, clinically relevant?In the 1980s when a group of investigators, including myself, were working on stroke models and absolute measurements of regional and local cerebral perfusion using non-radioactive xenon computed tomography (XeCT), the results of the Extracranial -Intracranial (EC/IC) Bypass study (4) were published. This randomized trial to assess the efficacy of this seemingly ideal surgical procedure, which at the time was rapidly growing in the number of operations performed around the world, showed that the procedure was actually not as beneficial as originally perceived by most. In effect, it was more harmful than beneficial in the general population it had been applied to at the time. This was a great surprise and a significant disappointment to the field of micro-vascular neurosurgery. As a member of a group of investigators interested in this very question, we strongly believed we had a very "appropriate" way, namely to use XeCT perfusion measurements, to select a sub-set of the population in question who "should clearly benefit" from this surgical procedure, despite the overall negative results of the EC/IC bypass study. Shortly a...