Leland McInnes scite author profile

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. e result is a practical scalable algorithm that applies to real world data. e UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.

show abstract

Dimensionality reduction for visualizing single-cell data using UMAP

Becht

McInnes²,

Healy³

et al. 2018

Nat Biotechnol

4,075

3,171

View full text Add to dashboard Cite

hdbscan: Hierarchical density based clustering

McInnes¹,

Healy²,

Astels³

2017

JOSS

1,604

907

View full text Add to dashboard Cite

Accelerated Hierarchical Density Based Clustering

McInnes¹,

Healy²

2017

341

199

View full text Add to dashboard Cite

We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter . This makes accelerated HDBSCAN* the default choice for density based clustering. arXiv:1705.07321v2 [stat.ML] 23 May 2017 1 See https://github.com/lmcinnes/hdbscan_paper/blob/master/Qualitative% 20clustering%20results.ipynb for code used to generate these plots

show abstract

Parametric UMAP Embeddings for Representation and Semisupervised Learning

McInnes²,

2021

View full text Add to dashboard Cite

UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Leland McInnes

UMAP: Uniform Manifold Approximation and Projection

Dimensionality reduction for visualizing single-cell data using UMAP

hdbscan: Hierarchical density based clustering

Accelerated Hierarchical Density Based Clustering

Parametric UMAP Embeddings for Representation and Semisupervised Learning

Contact Info

Product

Resources

About