The problem of residual variance estimation consists of estimating the best possible generalization error obtainable by any model based on a finite sample of data. Even though it is a natural generalization of linear correlation, residual variance estimation in its general form has attracted relatively little attention in machine learning.In this paper, we examine four different residual variance estimators and analyze their properties both theoretically and experimentally to understand better their applicability in machine learning problems. The theoretical treatment differs from previous work by being based on a general formulation of the problem covering also heteroscedastic noise in contrary to previous work, which concentrates on homoscedastic and additive noise.In the second part of the paper, we demonstrate practical applications in input and model structure selection. The experimental results show that using residual variance estimators in these tasks gives good results often with a reduced computational complexity, while the nearest neighbor estimators are simple and easy to implement.
Abstract. The residual variance estimation problem is well-known in statistics and machine learning with many applications for example in the field of nonlinear modelling. In this paper, we show that the problem can be formulated in a general supervised learning context. Emphasis is on two widely used non-parametric techniques known as the Delta test and the Gamma test. Under some regularity assumptions, a novel proof of convergence of the two estimators is formulated and subsequently verified and compared on two meaningful study cases.
In this paper, the problem of residual variance estimation is examined. The problem is analyzed in a general setting which covers non-additive heteroscedastic noise under non-iid sampling. To address the estimation problem, we suggest a method based on nearest neighbor graphs and we discuss its convergence properties under the assumption of a Hölder continuous regression function. The universality of the estimator makes it an ideal tool in problems with only little prior knowledge available.
ABSTRACT:In this paper, the moments of nearest neighbor distance distributions are examined. While the asymptotic form of such moments is wellknown, the boundary effect has this far resisted a rigorous analysis. Our goal is to develop a new technique that allows a closed-form high order expansion, where the boundaries are taken into account up to the first order. The resulting theoretical predictions are tested via simulations and found to be much more accurate than the first order approximation obtained by neglecting the boundaries.While our results are of theoretical interest, they definitely also have important applications in statistics and physics. As a concrete example, we mention estimating Renyi entropies of probability distributions. Moreover, the algebraic technique developed may turn out to be useful in other, related problems including estimation of the Shannon differential entropy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.