In this paper we offer a unified approach to the problem of nonparametric regression on the unit interval. It is based on a universal, honest and non-asymptotic confidence region An which is defined by a set of linear inequalities involving the values of the functions at the design points. Interest will typically centre on certain simplest functions in An where simplicity can be defined in terms of shape (number of local extremes, intervals of convexity/concavity) or smoothness (bounds on derivatives) or a combination of both. Once some form of regularization has been decided upon the confidence region can be used to provide honest non-asymptotic confidence bounds which are less informative but conceptually much simpler.
Given a data set (t i , y i ), i = 1, . . . , n with the t i ∈ [0, 1] non-parametric regression is concerned with the problem of specifying a suitable function f n : [0, 1] → R such that the data can be reasonably approximated by the points (t i , f n (t i )), i = 1, . . . , n. If a data set exhibits large variations in local behaviour, for example large peaks as in spectroscopy data, then the method must be able to adapt to the local changes in smoothness. Whilst many methods are able to accomplish this they are less successful at adapting derivatives.In this paper we show how the goal of local adaptivity of the function and its first and second derivatives can be attained in a simple manner using weighted smoothing splines. A residual based concept of approximation is used which forces local adaptivity of the regression function together with a global regularization which makes the function as smooth as possible subject to the approximation constraints.
We consider data consisting of photon counts of diffracted x-ray radiation as a function of the angle of diffraction. The problem is to determine the positions, powers and shapes of the relevant peaks. An additional difficulty is that the power of the peaks is to be measured from a baseline which itself must be identified. Most methods of de-noising data of this kind do not explicitly take into account the modality of the final estimate. The residual-based procedure we propose uses the so-called taut string method, which minimizes the number of peaks subject to a tube constraint on the integrated data. The baseline is identified by combining the result of the taut string with an estimate of the first derivative of the baseline obtained using a weighted smoothing spline. Finally, each individual peak is expressed as the finite sum of kernels chosen from a parametric family.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.