valuation is a cornerstone of informatics, allowing us to objectively assess the strengths and weaknesses of a given tool. These insights ultimately provide insight and feedback for the improvement of a system and its approach in the future. Thus, this final chapter aims to provide an overview of the fundamental techniques that are used in informatics evaluations. The basis upon which any quantitative evaluation starts is with statistics and formal study design. A review of inferential statistical concepts is provided from the perspective of biostatistics (confidence intervals; hypothesis testing; error assessment including sensitivity/ specificity and receiver operating characteristics). Under study design, differences between observational investigations and controlled experiments are covered. Issues pertaining to population selection and study errors are briefly introduced. With these general tools, we then look to more specific informatics evaluations, using information retrieval (IR) systems and usability studies as examples to motivate further discussion. Methods for designing both types of evaluations and endpoint metrics are described in detail.
Biostatistics and Study Design: A PrimerCentral to any evaluation is an understanding of statistics and the systematic methods used to design experiments that are unbiased and that will correctly answer questions of efficacy and impact. The focus of statistical analysis is the interpretation of a collection of data describing some phenomena. Descriptive statistics (e.g., mean, median, mode) provide a summary of the collection, whereas inferential statistics aim to draw inferences about a population from a (random) sample. We start this chapter with a brief review of biostatistical concepts common to evaluation in biomedical informatics, leading into a discussion of study design and decision-making methods. Note that this section is not intended to be an instructional resource for statistics, but rather assumes some basic statistical knowledge on the part of the reader. For more detailed coverage of foundational concepts, the reader is referred to [15].
Statistical ConceptsInferential statistics is concerned with the estimation of parameters that describe a population. Common tasks include: point estimates from a distribution (e.g., calculating the mean from a random sample); interval estimates (e.g., confidence intervals); hypothesis testing; and prediction (or, in the context of biostatistics, medical decision making). Interval estimates and hypothesis testing are covered in the sections immediately below; and medical decision making is covered in a separate section.
Confidence IntervalsWhen inferring values about a population, there is an inherent question of how "good" the estimate might be. Confidence intervals indicate the reliability of an estimate, providing an upper and lower bound around an estimated parameter. For instance, assume that a drug test shows that 40% of subjects experience improvement; a 95% confidence interval on this statistic would mean th...