Computing Population Variance and Entropy under Interval Uncertainty: Linear-Time Algorithms

Xiang, Gang; Ceberio, Martine; Kreinovich, Vladik

doi:10.1007/s11155-007-9045-6

Cited by 17 publications

(12 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The table below summarizes the computability results for statistics of data sets containing intervals that have been established in this report and elsewhere (Ferson et al 2002a,b;2004a,b;2005a,b,c;Wu et al 2003;Xiang 2006;Xiang et al 2006;Dantsin et al 2006;Xiang et al 2007a). …”

Section: Computability Of Interval Statisticsmentioning

confidence: 93%

“…This also means that the midpoints of the original intervals can likewise be ordered. Starks et al (2004;Dantsin et al 2006) showed that, in such an ordering, the largest possible variance is attained by a configuration of scalar values within the respective intervals that occupy the left endpoints for the first K intervals and the right endpoints for the remaining intervals, for some integer K. A brute force search for the index value K that maximizes variance can be found with an algorithm that runs in O(N log N) time (Dantsin et al 2006), but by exploiting observations about how tentative variance calculations change at nearby corners of the Cartesian space formed by the input intervals, Xiang et al (2007a) were able to derive an even better algorithm that runs in linear time O(N), which is given below.…”

Section: Arrangeable Datamentioning

confidence: 99%

See 1 more Smart Citation

Experimental uncertainty estimation and statistics for data having interval uncertainty.

Kreinovich

Oberkampf

Ginzburg

et al. 2007

Self Cite

144

View full text Add to dashboard Cite

This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties. 4 AcknowledgmentsWe thank Roger Nelsen of Lewis and Clark University, Bill Huber of Quantitative Decisions, Gang Xiang, Jan Beck, Scott Starks and Luc Longpré of University of Texas at El Paso, Troy Tucker and David Myers of Applied Biomathematics, Chuck Haas of Drexel University, Arnold Neumaier of Universität Wien, and Jonathan Lucero and Cliff Joslyn of Los Alamos National Laboratory for their collaboration. We also thank Floyd Spencer and Steve Crowder for reviewing the report and giving advice that substantially improved it. Jason C. Cole of Consulting Measurement Group also offered helpful comments. Finally, we thank Tony O'Hagan of University of Sheffield for challenging us to say where intervals come from. The work described in this report was performed for Sandia National Laboratories under Contract Number 19094. William Oberkampf managed the project for Sandia. As always, the opinions expressed in this report and any mistakes it may harbor are solely those of the authors. This report will live electronically at http://www.ramas.com/intstats.pdf. Please send any corrections and suggestions for improvements to scott@ramas.com. the empty set  infinity, or simply a very large number  standard normal cumulative distribution function E the true scalar value of the arithmetic mean of measurands in a data set N sample size N(m, s) normal distribution with mean m and standard deviation s O(  ) on the order of P(  ) probability of R real line R* extended real line, i.e., R  {, } S N (X) the fraction of N values in a data set that are at or below the magnitude X Note on typography. Upper case letters are used to denote scalars and lower case letter are used for intervals. (The only major exception is that we still use lower case i, j, and k for integer indices.) For instance, the symbol E denotes the scalar value of the arithmetic mean of the measurands in a data set, even if we don't know its magnitude precisely. We might use ...

show abstract

Section: Computability Of Interval Statisticsmentioning

confidence: 93%

Section: Arrangeable Datamentioning

confidence: 99%

Experimental uncertainty estimation and statistics for data having interval uncertainty.

Kreinovich

Oberkampf

Ginzburg

et al. 2007

Self Cite

144

View full text Add to dashboard Cite

show abstract

“…In the case of probabilistic uncertainty, there is a wellestablished way to gauge the amount of uncertainty: namely, the entropy [9], [18] …”

Section: Formulation Of the Problem In Precise Termsmentioning

confidence: 99%

Coming up with a good question is not easy: A proof

Lorkowski

Longpré

Kosheleva

et al. 2015

2015 Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS) Held Jointly With 2015 5th World Con

View full text Add to dashboard Cite

Abstract-Ability to ask good questions is an important part of learning skills. Coming up with a good question, a question that can really improve one's understanding of the topic, is not easy. In this paper, we prove -on the example of probabilistic and fuzzy uncertainty -that the problem of selecting of a good question is indeed hard. I. FORMULATION OF THE PROBLEMAsking good questions is important. Even after a very good lecture, some parts of the material remain not perfectly clear. A natural way to clarify these parts is to ask questions to the lecturer.Ideally, we should be able to ask a question that immediately clarifies the desired part of the material. Coming up with such good questions is an important part of learning process, it is a skill that takes a long time to master.Coming up with good questions is not easy: an empirical fact. Even for experienced people, it is not easy to come up with a good question, i.e., with a question that will maximally decrease uncertainty.What we do in this paper. In this paper, we prove that the problem of designing a good question is indeed computationally difficult (NP-hard).We will show this both for probabilistic and for fuzzy uncertainty. Specifically, we will prove NP-hardness for the simplest types of questions -for "yes"-"no" questions for which the answer is "yes" or "no". Since already designing such simple questions is NP-hard, any more general problem (allowing more complex problems) is NP-hard as well. II. TOWARDS DESCRIBING THE PROBLEM IN PRECISETERMS: GENERAL CASE

show abstract

“…The variance (2) is, in general, not monotonic; so, for the variance, the problem of computing the range [V , V ] under interval uncertainty is more complex. Specifically, it turns out that while the lower endpoint V can be computed in linear time [8], the problem of computing V is, in general, NP-hard [1], [2].…”

Section: Formulation Of the Problemmentioning

confidence: 99%

Estimating mean under interval uncertainty and variance constraint

Kamali

Longpré

Koshelev

2011

2011 Annual Meeting of the North American Fuzzy Information Processing Society

View full text Add to dashboard Cite

Abstract-In many practical situations, we have a sample of objects of a given type. When we measure the values of a certain quantity for these objects, we get a sequence of value x1, . . . , xn. When the sample is large enough, then the arithmetic mean E of the values xi is a good approximation for the average value of this quantity for all the objects from this class.The values xi come from measurements, and measurement is never absolutely accurate. Often, the only information that we have about the measurement error is the upper bound ∆i on this error. In this case, once we have the measurement result xi, the condition that | xi − xi| ≤ ∆i implies that the actual (unknown) value xi belongs to the interval [ xi − ∆i, xi + ∆i].In addition, we often know the upper bound V0 on the variance V of the actual values -e.g., we know that the objects belong to the same species, and we know that within-species differences cannot be too high. In such cases, to estimate the average over the class, we need to find the range of possible value of the mean under the constraints that each xi belongs to the given interval [x i , xi] and that the variance V (x1, . . . , xn) is bounded by a given value V0. In this paper, we provide efficient algorithms for computing this range.

show abstract

Computing Population Variance and Entropy under Interval Uncertainty: Linear-Time Algorithms

Cited by 17 publications

References 16 publications

Experimental uncertainty estimation and statistics for data having interval uncertainty.

Experimental uncertainty estimation and statistics for data having interval uncertainty.

Coming up with a good question is not easy: A proof

Estimating mean under interval uncertainty and variance constraint

Contact Info

Product

Resources

About