2005
DOI: 10.1111/j.1467-9868.2005.00510.x
|View full text |Cite
|
Sign up to set email alerts
|

Geometric Representation of High Dimension, Low Sample Size Data

Abstract: High dimension, low sample size data are emerging in various areas of science. We find a common structure underlying many such data sets by using a non-standard type of asymptotics: the dimension tends to ∞ while the sample size is fixed. Our analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex. Essentially all the randomness in the data appears only as a random rotation of this simplex. This geometric representation is used to obtain several new statistical insi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

19
392
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 427 publications
(411 citation statements)
references
References 21 publications
19
392
0
Order By: Relevance
“…As Hall et al (2005) also show, Gaussian data behave in the same way and a demonstration of this is seen in Table 1. To provide an idea of consensus of these results, the 200,000-dimensional Gaussian was repeated and yielded on successive runs values of the ultrametricity measure of: 0.96, 0.98, 0.96.…”
Section: Quantifying Degree Of Ultrametricitysupporting
confidence: 60%
“…As Hall et al (2005) also show, Gaussian data behave in the same way and a demonstration of this is seen in Table 1. To provide an idea of consensus of these results, the 200,000-dimensional Gaussian was repeated and yielded on successive runs values of the ultrametricity measure of: 0.96, 0.98, 0.96.…”
Section: Quantifying Degree Of Ultrametricitysupporting
confidence: 60%
“…. , x n ∈ S d with n d, which is frequently referred to as the high-dimension lowsample-size situation (Hall et al, 2005;Dryden, 2005;Ahn et al, 2007). In Euclidean space, the dimension of the data can be reduced to n without losing any information.…”
Section: Computational Algorithmmentioning
confidence: 99%
“…We consider asymptotics of the method for d → ∞ with the sample size n fixed. Hall et al (2005) first demonstrated the insight available from such asymptotics. They showed that, under some conditions, each data point in a sample of size n tends to lie near a vertex of a regular n-simplex and all the randomness in the data appears in the form of a random rotation of this simplex.…”
Section: ·1 Four Clusters Casementioning
confidence: 99%
“…The regularity conditions for the geometric representation in Hall et al (2005) require that the entries of the data vector satisfy a ρ-mixing condition. Ahn et al (2007) gave a milder condition using asymptotic properties of the sample covariance.…”
Section: ·1 Four Clusters Casementioning
confidence: 99%
See 1 more Smart Citation