From a dataset of input-output vectors, the Gamma test estimates the variance of the noise on an output modulo any smooth model with bounded partial derivatives. We present a proof of the Gamma test under fairly weak hypotheses.
Let C be a compact convex body in R m and consider a set of points selected at random from C according to some well-behaved sampling distribution. We obtain an asymptotic expression for the positive moments of the kth near-neighbour distance distribution as the number of points increases to in nity.
Mutual information quantifies the determinism that exists in a relationship between random variables, and thus plays an important role in exploratory data analysis. We investigate a class of non-parametric estimators for mutual information, based on the nearest neighbour structure of observations in both the joint and marginal spaces. Unless both marginal spaces are one-dimensional, we demonstrate that a well-known estimator of this type can be computationally expensive under certain conditions, and propose a computationally efficient alternative that has a time complexity of order O(N log N ) as the number of observations N/N.
In practical data analysis, methods based on proximity (near-neighbour) relationships between sample points are important because these relations can be computed in time O(n log n) as the number of points n/N. Associated with such methods are a class of random variables defined to be functions of a given point and its nearest neighbours in the sample. If the sample points are independent and identically distributed, the associated random variables will also be identically distributed but not independent. Despite this, we show that random variables of this type satisfy a strong law of large numbers, in the sense that their sample means converge to their expected values almost surely as the number of sample points n/N.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.