2019
DOI: 10.1101/706101
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Seeing distinct groups where there are none: spurious patterns from between-group PCA

Abstract: 1 2 Using sampling experiments, we found that, when there are fewer groups than variables, between-groups PCA 3 (bgPCA) may suggest surprisingly distinct differences among groups for data in which none exist. While 4 apparently not noticed before, the reasons for this problem are easy to understand. A bgPCA captures the g-1 5 dimensions of variation among the g group means, but only a fraction of the i n g −

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(21 citation statements)
references
References 60 publications
1
20
0
Order By: Relevance
“…Similarly, the score on bgPC1 from the simulation at upper center correlates 0.83 with the actual simulated factor score, and likewise correctly fails to separate the groups, whereas once again component bgPC2 is wholly fictional. Note the striking similarity of the panel at upper center in this figure to the lower right panel of Figure 1 of Cardini et al (2019), a three-group analysis based on a real GMM data set, although likewise with wholly fictional groups.…”
Section: Some Aspects Of Truth Persist In Spite Of the Pathologiesmentioning
confidence: 54%
See 3 more Smart Citations
“…Similarly, the score on bgPC1 from the simulation at upper center correlates 0.83 with the actual simulated factor score, and likewise correctly fails to separate the groups, whereas once again component bgPC2 is wholly fictional. Note the striking similarity of the panel at upper center in this figure to the lower right panel of Figure 1 of Cardini et al (2019), a three-group analysis based on a real GMM data set, although likewise with wholly fictional groups.…”
Section: Some Aspects Of Truth Persist In Spite Of the Pathologiesmentioning
confidence: 54%
“…Acknowledgements Andrea Cardini (University of Modena and University of Western Australia) was the first in our group to notice bgPCA's clustering pathology, and the idea of a pair of papers scrutinizing applications of bgPCA to high-dimensional GMM data sets, particularly insofar as they ignore the implications of the MPT, was his. Our discussion sharpened over many months of interaction also involving Jim Rohlf (Stony Brook University), and Paul O'Higgins (University of York), discussions leading to Cardini et al (2019) as well as this paper. Norm MacLeod (London and University of Nanjing) guided me through a thoughtful review of the twentieth-century origins of the method criticized here, and Michael Perlman (University of Washington) was a helpful guide into the literature of the MPT at the time all this work began.…”
Section: Concluding Observationsmentioning
confidence: 87%
See 2 more Smart Citations
“…We performed bgPCA because it extracts fewer principal components that explain most of the variation in the data; as a result, a low‐dimensional PCA plot is reliable for understanding high dimensional multivariate data. However, since bgPCA can suffer from limitations (Cardini, O’Higgins & Rohlf, 2019), we also tested independently for significant differences between species and groups based on morphological data using permutations multivariate analysis of variance (PERMANOVA) (Anderson, 2001). PERMANOVA tests the null hypothesis that the centroids and dispersion of the groups are equivalent for all groups.…”
Section: Methodsmentioning
confidence: 99%