An Optimal Variable Cell Histogram Based on the Sample Spacings

Kanazawa, Yuichiro

doi:10.1214/aos/1176348523

Cited by 24 publications

(13 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, we would like to point out that the case f = f τ,α is excluded by Kanazawa's differentiability assumption on F −1 . If f is continuous there is no correct number of steps as Kanazawa [9] points out. As a consequence to obtain a consistent estimator d should depend on the sample size n. Kanazawa [9] suggests a sample-based d =d n and shows thatd n ∼ λ(f )n 1/3 in probability and gives an explicit formula for the functional λ(f ).…”

Section: Weak and Strong Consistencymentioning

confidence: 99%

Density estimation via best $L^2$-approximation on classes of step functions

Ferger¹,

Venz²

2017

Kybernetika

View full text Add to dashboard Cite

Section: Weak and Strong Consistencymentioning

confidence: 99%

Density estimation via best $L^2$-approximation on classes of step functions

Ferger¹,

Venz²

2017

Kybernetika

View full text Add to dashboard Cite

“…Parametric density estimation requires a certain distribution assumption, while non-parametric estimation does not. Among the various techniques proposed for non-parametric density estimation [20], histogram estimation [34], kernel estimation [1,11] and nearest neighbor estimation [37] are the most popular. In this paper, we use kernel estimation, because it can estimate unknown data distributions effectively [28].…”

Section: Likelihood Computationmentioning

confidence: 99%

Top-k typicality queries and efficient query answering methods on large databases

Hua

Pei

et al. 2009

The VLDB Journal

View full text Add to dashboard Cite

Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types This paper is a substantial extension of the conference version [32]. The research of Ming Hua and Jian Pei is supported in part by an NSERC Discovery grant and an NSERC Discovery Accelerator Supplements grant. The research of Xuemin Lin is supported in part by the Australian Research Council Discovery Grants DP0987557, DP0881035 and DP0666428 and a Google research award. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies. We are grateful to the anonymous reviewers and Dr. Peter Haas, the associate editor, for their constructive comments which help to improve the quality of the paper substantially. Particularly, we thank them for their advice in improving the mathematical exposition and their suggestions about some important related work. of top-k typicality queries are formulated. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Moreover, to answer questions like "Who are the best k typical guards in whole representing different types of guards?", the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical. M. Hua (B)·

show abstract

“…Here we reach the point where we need to use dynamical programming (see Kanazawa (1992)). The fundamental idea of dynamical programming is that to go to point j with d steps (i.e.…”

Section: Dynamical Programmingmentioning

confidence: 99%

A new algorithm for fixed design regression and denoising

Comte

Rozenholc

2004

Ann Inst Stat Math

View full text Add to dashboard Cite

Abstract. In this paper, we present a new algorithm to estimate a regression function in a fixed design regression model, by piecewise (standard and trigonometric) polynomials computed with an automatic choice of the knots of the subdivision and of the degrees of the polynomials on each sub-interval. First we give the theoretical background underlying the method: the theoretical performances of our penalized least-squares estimator are based on non-asymptotic evaluations of a mean-square type risk. Then we explain how the algorithm is built and possibly accelerated (to face the case when the number of observations is great), how the penalty term is chosen and why it contains some constants requiring an empirical calibration. Lastly, a comparison with some well-known or recent wavelet methods is made: this brings out that our algorithm behaves in a very competitive way in term of denoising and of compression. IntroductionWe consider in this paper the problem of estimating an unknown function f from [0,1] into IR when we observe the sequence Y i , i = 1, . . . , n, satisfyingfor fixed x i , i = 1, . . . , n in [0, 1] with 0 ≤ x 1 < x 2 < · · · < x n ≤ 1. Most of the theoretical part of the work concerns any type of design but only the equispaced design x i = i/n is computationally considered and implemented. Here ε i , 1 ≤ i ≤ n is a sequence of independent and identically distributed random variables with mean 0 and variance 1. The positive constant σ is first assumed to be known. Extensions to the case where it is unknown are proposed. We aim at estimating the function f with a data driven procedure. In fact, we want to estimate f by piecewise standard and trigonometric polynomials in a spirit analogous but more general than e.g. Denison et al. (1998). We also want to choose among "all possible subsets of a large collection of pre-specified candidates knot sites" as well as among various degrees on each subinterval defined by two consecutive knots.Our method is based on recent theoretical results obtained by Baraud (2000Baraud ( , 2002, Baraud et al. (2001a, b) who adapted to the regression problem general methods of model selection and adaptive estimation initiated by Barron and Cover (1991) and developed by Birgé and Massart (1998), Barron et al. (1999) Birgé and Massart (2001). It is worth mentioning that a similar (theoretical) solution to our regression problem, in a context of regression with random design, is studied by Kohler (1999): he proposes also piecewise smooth regression functions to estimate the regression function, and he uses a penalized least-squares criterion as well. The approach is similar to Baraud's (2000) and he uses Vapnis-Chervonenkis theory in place of Talagrand's or deviation inequalities.All the results we have in mind about fixed design regression have the specificity of giving non asymptotic risk bounds and of dealing with adaptive estimators. The first results about adaptation in the minimax sense in that context were given by Efromovich and Pinsker (1984). Some asymptotic resul...

show abstract

An Optimal Variable Cell Histogram Based on the Sample Spacings

Cited by 24 publications

References 5 publications

Density estimation via best $L^2$-approximation on classes of step functions

Density estimation via best $L^2$-approximation on classes of step functions

Top-k typicality queries and efficient query answering methods on large databases

A new algorithm for fixed design regression and denoising

Contact Info

Product

Resources

About