2020
DOI: 10.48550/arxiv.2012.04456
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization

Abstract: Dimension reduction (DR) techniques such as t-SNE, UMAP, and TriMAP have demonstrated impressive visualization performance on many real world datasets. One tension that has always faced these methods is the trade-off between preservation of global structure and preservation of local structure: these methods can either handle one or the other, but not both. In this work, our main goal is to understand what aspects of DR methods are important for preserving both local and global structure: it is difficult to des… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 40 publications
0
11
0
Order By: Relevance
“…Since t-SNE is highly popular, there are many experimental studies and guides for selecting parameters and validating results. We especially highlight two recent studies by Kobak & Berens [14] and Wang, Huang, Rudin & Shaposhnik [22]. We also point out the study by Böhm, Behrens & Kobak [5], which shows that force-based methods lie on an attraction-repulsion spectrum and can be empirically recovered by tuning the forces used to create the embedding.…”
mentioning
confidence: 73%
See 1 more Smart Citation
“…Since t-SNE is highly popular, there are many experimental studies and guides for selecting parameters and validating results. We especially highlight two recent studies by Kobak & Berens [14] and Wang, Huang, Rudin & Shaposhnik [22]. We also point out the study by Böhm, Behrens & Kobak [5], which shows that force-based methods lie on an attraction-repulsion spectrum and can be empirically recovered by tuning the forces used to create the embedding.…”
mentioning
confidence: 73%
“…(2) that the underlying approach also extends to other attraction-repulsion based methods. Indeed, a similar type of analysis should be possible for many of the methods discussed in [5,22]. One reason there is so little theoretical work on t-SNE is the complexity of the setup: we are given a set of points X = {x 1 , .…”
Section: Mean Field Limit For T-snementioning
confidence: 99%
“…The t-SNE algorithm is probably the most popular among ML researchers. They often use it to visualize cluster structures learned by deep learning models [44,54,55]. While t-SNE often plots each data point as a small circle in a 2-D space, the nature of images provides us with the opportunity to directly plot a small thumbnail instead of a dot.…”
Section: Similarity-based Visualization Methodsmentioning
confidence: 99%
“…Distribution shapes look similar for dev datasets. Next, we explore the vector data by reducing dimensionality to the 2D space using the Pairwise Controlled Manifold Approximation Projection (PaCMAP) algorithm (Wang et al, 2020). Figure 5 shows the distributions of all three types of embeddings in the train and validation (development) datasets for English, French, and Russian.…”
Section: A3 Qualitative Analysis Of Generated Glossesmentioning
confidence: 99%