Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization

Wang, Yingfan; Huang, Haiyang; Rudin, Cynthia; Shaposhnik, Yaron

doi:10.48550/arxiv.2012.04456

Cited by 14 publications

(11 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since t-SNE is highly popular, there are many experimental studies and guides for selecting parameters and validating results. We especially highlight two recent studies by Kobak & Berens [14] and Wang, Huang, Rudin & Shaposhnik [22]. We also point out the study by Böhm, Behrens & Kobak [5], which shows that force-based methods lie on an attraction-repulsion spectrum and can be empirically recovered by tuning the forces used to create the embedding.…”

mentioning

confidence: 73%

See 1 more Smart Citation

t-SNE, Forceful Colorings and Mean Field Limits

Zhang¹,

Steinerberger²

2021

Preprint

View full text Add to dashboard Cite

t-SNE is one of the most commonly used force-based nonlinear dimensionality reduction methods. This paper has two contributions: the first is forceful colorings, an idea that is also applicable to other force-based methods (UMAP, ForceAtlas2,. . . ). In every equilibrium, the attractive and repulsive forces acting on a particle cancel out: however, both the size and the direction of the attractive (or repulsive) forces acting on a particle are related to its properties: the force vector can serve as an additional feature. Secondly, we analyze the case of t-SNE acting on a single homogeneous cluster (modeled by affinities coming from the adjacency matrix of a random k−regular graph); we derive a mean-field model that leads to interesting questions in classical calculus of variations. The model predicts that, in the limit, the t-SNE embedding of a single perfectly homogeneous cluster is not a point but a thin annulus of diameter ∼ k −1/4 n −1/4 . This is supported by numerical results. The mean field ansatz extends to other force-based dimensionality reduction methods.

show abstract

mentioning

confidence: 73%

“…(2) that the underlying approach also extends to other attraction-repulsion based methods. Indeed, a similar type of analysis should be possible for many of the methods discussed in [5,22]. One reason there is so little theoretical work on t-SNE is the complexity of the setup: we are given a set of points X = {x 1 , .…”

Section: Mean Field Limit For T-snementioning

confidence: 99%

t-SNE, Forceful Colorings and Mean Field Limits

Zhang¹,

Steinerberger²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The t-SNE algorithm is probably the most popular among ML researchers. They often use it to visualize cluster structures learned by deep learning models [44,54,55]. While t-SNE often plots each data point as a small circle in a 2-D space, the nature of images provides us with the opportunity to directly plot a small thumbnail instead of a dot.…”

Section: Similarity-based Visualization Methodsmentioning

confidence: 99%

DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Bertucci¹,

Hamid²,

Anand³

et al. 2022

Preprint

View full text Add to dashboard Cite

Fig. 1. With DendroMap, users can explore large-scale image datasets by overviewing the overall distributions and zooming down into hierarchies of image groups at multiple levels of abstraction. In this example, we visualize images of the CIFAR-100 dataset by hierarchically clustering the image representations obtained from a ResNet50 image classification model. (B) DendroMap View displays these clusters of images organized as a hierarchical structure by adapting Treemaps. By clicking on a cluster, a user can interactively (C) Zoom into that image group, revealing subgroups that replace and fill the available space with animation (see the submitted video).The user clicked on a cluster for organism images, which creates distinct subgroups of fish, insects, worms, fruits, and flowers. With (A) Sidebar View, the user can dynamically adjust the number of clusters to be displayed and inspect the class-level statistics.

show abstract

“…Distribution shapes look similar for dev datasets. Next, we explore the vector data by reducing dimensionality to the 2D space using the Pairwise Controlled Manifold Approximation Projection (PaCMAP) algorithm (Wang et al, 2020). Figure 5 shows the distributions of all three types of embeddings in the train and validation (development) datasets for English, French, and Russian.…”

Section: A3 Qualitative Analysis Of Generated Glossesmentioning

confidence: 99%

IRB-NLP at SemEval-2022 Task 1: Exploring the Relationship Between Words and Their Semantic Representations

Korenčić¹,

Grubišić²

2022

Preprint

View full text Add to dashboard Cite

What is the relation between a word and its description, or a word and its embedding? Both descriptions and embeddings are semantic representations of words. But, what information from the original word remains in these representations? Or more importantly, which information about a word do these two representations share? Definition Modeling and Reverse Dictionary are two opposite learning tasks that address these questions. The goal of the Definition Modeling task is to investigate the power of information laying inside a word embedding to express the meaning of the word in a humanly understandable way -as a dictionary definition. Conversely, the Reverse Dictionary task explores the ability to predict word embeddings directly from its definition. In this paper, by tackling these two tasks, we are exploring the relationship between words and their semantic representations. We present our findings based on the descriptive, exploratory, and predictive data analysis conducted on the CODWOE dataset. We give a detailed overview of the systems that we designed for Definition Modeling and Reverse Dictionary tasks, and that achieved top scores on SemEval-2022 CODWOE challenge in several subtasks. We hope that our experimental results concerning the predictive models and the data analyses we provide will prove useful in future explorations of word representations and their relationships.

show abstract

Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization

Cited by 14 publications

References 40 publications

t-SNE, Forceful Colorings and Mean Field Limits

t-SNE, Forceful Colorings and Mean Field Limits

DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

IRB-NLP at SemEval-2022 Task 1: Exploring the Relationship Between Words and Their Semantic Representations

Contact Info

Product

Resources

About