Projection pursuit is a classical exploratory data analysis method to detect interesting low-dimensional structures in multivariate data. Originally, projection pursuit was applied mostly to data of moderately low dimension. Motivated by contemporary applications, we here study its properties in high-dimensional settings. Specifically, we analyze the asymptotic properties of projection pursuit on structureless multivariate Gaussian data with an identity covariance, as both dimension p and sample size n tend to infinity, with [Formula: see text] Our main results are that () if [Formula: see text] then there exist projections whose corresponding empirical cumulative distribution function can approximate any arbitrary distribution; and () if [Formula: see text], not all limiting distributions are possible. However, depending on the value of γ, various non-Gaussian distributions may still be approximated. In contrast, if we restrict to sparse projections, involving only a few of the p variables, then asymptotically all empirical cumulative distribution functions are Gaussian. And () if [Formula: see text], then asymptotically all projections are Gaussian. Some of these results extend to mean-centered sub-Gaussian data and to projections into k dimensions. Hence, in the "small n, large p" setting, unless sparsity is enforced, and regardless of the chosen projection index, projection pursuit may detect an apparent structure that has no statistical significance. Furthermore, our work reveals fundamental limitations on the ability to detect non-Gaussian signals in high-dimensional data, in particular through independent component analysis and related non-Gaussian component analysis.
In this paper, we study two problems: (1) estimation of a d-dimensional log-concave distribution and (2) bounded multivariate convex regression with random design with an underlying log-concave density or a compactly supported distribution with a continuous density. First, we show that for all d ≥ 4 the maximum likelihood estimators of both problems achieve an optimal risk of Θ d (n −2/(d+1) ) * (up to a logarithmic factor) in terms of squared Hellinger distance and L 2 squared distance, respectively. Previously, the optimality of both these estimators was known only for d ≤ 3. We also prove that the ǫ-entropy numbers of the two aforementioned families are equal up to logarithmic factors. We complement these results by proving a sharp bound Θ d (n −2/(d+4) ) on the minimax rate (up to logarithmic factors) with respect to the total variation distance. Finally, we prove that estimating a log-concave density-even a uniform distribution on a convex set-up to a fixed accuracy requires the number of samples at least exponential in the dimension. We do that by improving the dimensional constant in the best known lower bound for the minimax rate from 2provides, under appropriate conditions, the global minimax rates for estimation with respect to squared L 2 (P) (for regression) and squared Hellinger (for density estimation) measures of closeness [Yang and Barron, 1999].Here N 2 (F , ǫ, P) is a covering number of F with respect to L 2 (P) at scale ǫ, defined as the smallest number of functions f 1 , . . . , f N ∈ F such that ∀f ∈ F , ∃j s.t. f − f j L2(P) ≤ ǫ . * In the regression setting, this bound is tight for certain measures, e.g. when the underlying distribution is uniform on a ball. However, for some log-concave measures, the minimax is of order Θ d (n1 1 We acknowledge the work by Han [2019] which appeared few months after our initial manuscript became available on arXiv. From a recent personal communication with the author, some of his results were achieved in his PhD thesis that was available online before our initial manuscript. The author used techniques that are very similar to our approach.2 See Remarks 1,2 for more details.
We establish estimates for the asymptotic best approximation of the Euclidean unit ball by polytopes under a notion of distance induced by the intrinsic volumes. We also introduce a notion of distance between convex bodies that is induced by the Wills functional and apply it to derive asymptotically sharp bounds for approximating the ball in high dimensions. Remarkably, it turns out that there is a polytope that is almost optimal with respect to all intrinsic volumes simultaneously, up to absolute constants. Finally, we establish asymptotic formulas for the best approximation of smooth convex bodies by polytopes with respect to a distance induced by dual volumes, which originate from Lutwak’s dual Brunn–Minkowski theory.
We prove that there is an absolute constant C such that for every n ≥ 2 and N ≥ 10 n , there exists a polytope Pn,N in R n with at most N facets that satisfieswhere Dn is the n-dimensional Euclidean unit ball. This result closes gaps from several papers of Hoehner, Ludwig, Schütt and Werner. The upper bounds are optimal up to absolute constants. This result shows that a polytope with an exponential number of facets can approximate the n-dimensional Euclidean ball with respect to the aforementioned metrics., 2010 Mathematics Subject Classification: Primary 52A22; Secondary 60D05.
Our main contribution is a concentration inequality for the symmetric volume difference of a C 2 convex body with positive Gaussian curvature and a circumscribed random polytope with a restricted number of facets, for any probability measure on the boundary with a positive density function.We also show that the Dirichlet-Voronoi tiling numbers satisfy divn−1 = (2πe) −1 (n + ln n) + O(1), which improves a classical result of Zador by a factor of o(n). In addition, we provide a remarkable open problem which is the natural geometric generalization of the famous and fundamental "balls and bins" problem from probability. This problem is tightly connected to the optimality of random polytopes in high dimensions.Finally, as an application of the aforementioned results, we derive a lower bound for the maximal Mahler volume product of polytopes with a restricted number of vertices or facets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.