Uniform Manifold Approximation and Projection (UMAP) is a recently-published non-linear dimensionality reduction technique. Another such algorithm, t-SNE, has been the default method for such task in the past years. Herein we comment on the usefulness of UMAP high-dimensional cytometry and single-cell RNA sequencing, notably highlighting faster runtime and consistency, meaningful organization of cell clusters and preservation of continuums in UMAP compared to t-SNE.
IntroductionThe last decades have witnessed a large increment in the number of parameters analysed in single cell cytometry studies. It currently reaches around 20 for flow-cytometry, 40 for masscytometry, and more than 20,000 in single-cell RNA-sequencing. In this context, dimensionality reduction techniques have been pivotal in enabling researchers to visualize high-dimensional data. While principal component analysis has historically been the main technique used for dimensionality reduction (DR), the recent years have highlighted the importance of non-linear DR techniques to avoid overcrowding issues. [3]). t-SNE is currently the most commonly-used technique and is efficient at highlighting local structure in the data, which for cytometry notably translates to the representation of cell populations as distinct clusters. t-SNE however suffers from limitations such as loss of large-scale information (the inter-cluster relationships), slow computation time and inability to meaningfully represent very large datasets [4]. A new algorithm, called Uniform Manifold Approximation and Projection (UMAP) has been recently published by McInnes and Healy[5]. They claim that compared to t-SNE it preserves as much of the local and more of the global data structure, with a shorter runtime. Since t-SNE has been extremely prevalent in the field of cytometry broadly encompassing flow and mass-cytometry as well as singlecell RNA-sequencing (scRNAseq), we tested these claims on well-characterized single-cell datasets [6][7][8]. We confirm that UMAP is an order of magnitude faster than t-SNE. In addition to this straightforward advantage, we argue that UMAP is not only able to create informative clusters, but is also able to organize these clusters in a meaningful way. We illustrate these claims by showing that UMAP can order clusters from T and NK cells from 8 human organs [7] in a way that both identifies major cell lineages but also a common axis that broadly recapitulates their differentiation stages. We also show that UMAP allows for an easier visualization of multibranched cellular trajectories by using a mass-cytometry[6] and a scRNAseq[8] datasets both recapitulating hematopoiesis.