This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
4
AcknowledgmentsWe thank Roger Nelsen of Lewis and Clark University, Bill Huber of Quantitative Decisions, Gang Xiang, Jan Beck, Scott Starks and Luc Longpré of University of Texas at El Paso, Troy Tucker and David Myers of Applied Biomathematics, Chuck Haas of Drexel University, Arnold Neumaier of Universität Wien, and Jonathan Lucero and Cliff Joslyn of Los Alamos National Laboratory for their collaboration. We also thank Floyd Spencer and Steve Crowder for reviewing the report and giving advice that substantially improved it. Jason C. Cole of Consulting Measurement Group also offered helpful comments. Finally, we thank Tony O'Hagan of University of Sheffield for challenging us to say where intervals come from. The work described in this report was performed for Sandia National Laboratories under Contract Number 19094. William Oberkampf managed the project for Sandia. As always, the opinions expressed in this report and any mistakes it may harbor are solely those of the authors. This report will live electronically at http://www.ramas.com/intstats.pdf. Please send any corrections and suggestions for improvements to scott@ramas.com. the empty set infinity, or simply a very large number standard normal cumulative distribution function E the true scalar value of the arithmetic mean of measurands in a data set N sample size N(m, s) normal distribution with mean m and standard deviation s O( ) on the order of P( ) probability of R real line R* extended real line, i.e., R {, } S N (X) the fraction of N values in a data set that are at or below the magnitude X Note on typography. Upper case letters are used to denote scalars and lower case letter are used for intervals. (The only major exception is that we still use lower case i, j, and k for integer indices.) For instance, the symbol E denotes the scalar value of the arithmetic mean of the measurands in a data set, even if we don't know its magnitude precisely. We might use ...