Diagnostic systems of several kinds are used to distinguish between two classes of events, essentially "signals" and "noise". For them, analysis in terms of the "relative operating characteristic" of signal detection theory provides a precise and valid measure of diagnostic accuracy. It is the only measure available that is uninfluenced by decision biases and prior probabilities, and it places the performances of diverse systems on a common, easily interpreted scale. Representative values of this measure are reported here for systems in medical imaging, materials testing, weather forecasting, information retrieval, polygraph lie detection, and aptitude testing. Though the measure itself is sound, the values obtained from tests of diagnostic systems often require qualification because the test data on which they are based are of unsure quality. A common set of problems in testing is faced in all fields. How well these problems are handled, or can be handled in a given field, determines the degree of confidence that can be placed in a measured value of accuracy. Some fields fare much better than others.
Tasks in which an observation is the basis for discriminating between two confusable alternatives are used widely in psychological experiments. Similar tasks occur routinely in many practical settings in which the objective is a diagnosis of some kind. Several indices have been proposed to quantify the accuracy of discrimination, whether the focus is on an observer's capacity or skill, on the usefulness of tools designed to aid an observer, or on the capability of a fully automated device. The suggestion treated here is that candidate indices be evaluated by calculating their relative operating characteristics (ROCs). The form of an index's ROC identifies the model of the discrimination process that is implied by the index, and that theoretical form can be compared with the form of empirical ROCs. If an index and its model yield a grossly different form of ROC than is observed in the data, then the model is invalid and the index will be unreliable. Most existing indices imply invalid models. A few indices are suitable; one is recommended.Subjects in experiments on perception, learning, memory, and cognition are often required to make a series of fine discriminations. In a common method, a single stimulus is presented on each trial and the subject indicates which of two similar stimuli it is, or from which of two similar categories of stimuli it was drawn. In addition, in several practical settings, professional diagnosticians and prognosticators must say time and again which of two conditions, confusable at the moment of decision, exists or will exist. Among them are physicians, nondestructive testers, product inspectors, process-plant supervisors, weather forecasters, mineralogists, stockbrokers, librarians, survey researchers, and admissions officers. There is interest in knowing both how accurately the experimental subjects and professionals perform and how accurately their various tools perform, and a dozen or more indices of discrimination accuracy are in common use. In this article I cover a way of discriminating among those indices that Theodore G. Birdsall, in the Electrical Engineering Department of The University of Michigan, first taught me about ROCs when he invented them; he has continued to share his thoughts and help me refine mine, and did so with this article. Charles E. Metz, of The University of Chicago's Radiology Department, contributed substantially as the structure of this article developed by describing his mathematical insights and providing derivations and proofs. Ian B. Mason, my correspondent in the Australian Bureau of Meteorology, helped advance some of the ideas treated here and extended my view of their generality. David J. Getty responded to soundings in the next office and assisted with mathematics. Others whose earlier work and comments on a draft contributed to this article are . As a referee, Neil Macmillan made helpful suggestions for exposition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.