Ayush Dalmia scite author profile

Ayush Dalmia

3Publications

66Citation Statements Received

38Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Sia¹,

Dalmia²,

Mielke³

2020

View full text Add to dashboard Cite

Topic models are a useful analysis tool to uncover the underlying themes within document collections. The dominant approach is to use probabilistic topic models that posit a generative story, but in this paper we propose an alternative way to obtain topics: clustering pretrained word embeddings while incorporating document information for weighted clustering and reranking top words. We provide benchmarks for the combination of different word embeddings and clustering algorithms, and analyse their performance under dimensionality reduction with PCA. The best performing combination for our approach performs as well as classical topic models, but with lower runtime and computational complexity.

show abstract

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Sia¹,

Dalmia²,

Mielke³

2020

Preprint

View full text Add to dashboard Cite

Clustering with UMAP: Why and How Connectivity Matters

Dalmia¹,

Sia²

2021

Preprint

View full text Add to dashboard Cite

Topology based dimensionality reduction methods such as t-SNE and UMAP have seen increasing success and popularity in highdimensional data. These methods have strong mathematical foundations and are based on the intuition that the topology in low dimensions should be close to that of high dimensions. Given that the initial topological structure is a precursor to the success of the algorithm, this naturally raises the question: What makes a "good" topological structure for dimensionality reduction? In this paper which focuses on UMAP, we study the effects of node connectivity (k-Nearest Neighbors vs mutual k-Nearest Neighbors) and relative neighborhood (Adjacent via Path Neighbors) on dimensionality reduction. We explore these concepts through extensive ablation studies on 4 standard image and text datasets; MNIST, FMNIST, 20NG, AG, reducing to 2 and 64 dimensions. Our findings indicate that a more refined notion of connectivity (mutual k-Nearest Neighbors with minimum spanning tree) together with a flexible method of constructing the local neighborhood (Path Neighbors), can achieve a much better representation than default UMAP, as measured by downstream clustering performance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ayush Dalmia

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Clustering with UMAP: Why and How Connectivity Matters

Contact Info

Product

Resources

About