Abstract. In this paper, we present a new algorithm to estimate a regression function in a fixed design regression model, by piecewise (standard and trigonometric) polynomials computed with an automatic choice of the knots of the subdivision and of the degrees of the polynomials on each sub-interval. First we give the theoretical background underlying the method: the theoretical performances of our penalized least-squares estimator are based on non-asymptotic evaluations of a mean-square type risk. Then we explain how the algorithm is built and possibly accelerated (to face the case when the number of observations is great), how the penalty term is chosen and why it contains some constants requiring an empirical calibration. Lastly, a comparison with some well-known or recent wavelet methods is made: this brings out that our algorithm behaves in a very competitive way in term of denoising and of compression.
IntroductionWe consider in this paper the problem of estimating an unknown function f from [0,1] into IR when we observe the sequence Y i , i = 1, . . . , n, satisfyingfor fixed x i , i = 1, . . . , n in [0, 1] with 0 ≤ x 1 < x 2 < · · · < x n ≤ 1. Most of the theoretical part of the work concerns any type of design but only the equispaced design x i = i/n is computationally considered and implemented. Here ε i , 1 ≤ i ≤ n is a sequence of independent and identically distributed random variables with mean 0 and variance 1. The positive constant σ is first assumed to be known. Extensions to the case where it is unknown are proposed. We aim at estimating the function f with a data driven procedure. In fact, we want to estimate f by piecewise standard and trigonometric polynomials in a spirit analogous but more general than e.g. Denison et al. (1998). We also want to choose among "all possible subsets of a large collection of pre-specified candidates knot sites" as well as among various degrees on each subinterval defined by two consecutive knots.Our method is based on recent theoretical results obtained by Baraud (2000Baraud ( , 2002, Baraud et al. (2001a, b) who adapted to the regression problem general methods of model selection and adaptive estimation initiated by Barron and Cover (1991) and developed by Birgé and Massart (1998), Barron et al. (1999) Birgé and Massart (2001). It is worth mentioning that a similar (theoretical) solution to our regression problem, in a context of regression with random design, is studied by Kohler (1999): he proposes also piecewise smooth regression functions to estimate the regression function, and he uses a penalized least-squares criterion as well. The approach is similar to Baraud's (2000) and he uses Vapnis-Chervonenkis theory in place of Talagrand's or deviation inequalities.All the results we have in mind about fixed design regression have the specificity of giving non asymptotic risk bounds and of dealing with adaptive estimators. The first results about adaptation in the minimax sense in that context were given by Efromovich and Pinsker (1984). Some asymptotic resul...