2021
DOI: 10.48550/arxiv.2105.13440
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Non-negative matrix factorization algorithms greatly improve topic model fits

Abstract: We report on the potential for using algorithms for non-negative matrix factorization (NMF) to improve parameter estimation in topic models. While several papers have studied connections between NMF and topic models, none have suggested leveraging these connections to develop new algorithms for fitting topic models. Importantly, NMF avoids the "sum-to-one" constraints on the topic model parameters, resulting in an optimization problem with simpler structure and more efficient computations. Building on recent a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(28 citation statements)
references
References 21 publications
0
28
0
Order By: Relevance
“…The fact that the tumors in our dataset create a triangle-shaped continuum in latent space suggests that each tumor can be represented as a unique mixture of three “idealized” tumor components. Therefore, in order to provide a more quantitative interpretation, we fitted a topic model with k = 3 hidden “topics” to our dataset [ 22 ] (see Materials and Methods). This allowed us to infer both the three latent topics (that presumably represent the “idealized” tumor components) and also the proportions of topics from which every single tumor is composed.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The fact that the tumors in our dataset create a triangle-shaped continuum in latent space suggests that each tumor can be represented as a unique mixture of three “idealized” tumor components. Therefore, in order to provide a more quantitative interpretation, we fitted a topic model with k = 3 hidden “topics” to our dataset [ 22 ] (see Materials and Methods). This allowed us to infer both the three latent topics (that presumably represent the “idealized” tumor components) and also the proportions of topics from which every single tumor is composed.…”
Section: Resultsmentioning
confidence: 99%
“…In this study, the parameters of the topic model were learned using the “fit_topic_model” function from the R package “fastTopics” [ 22 ] (version 0.4-11). The number of latent topics was set between k = 2, 3, …, 10.…”
Section: Methodsmentioning
confidence: 99%
“…We carried out topic model analysis on taxonomic classification profiles for each sample using the R package fastTopics 65 (https://github.com/stephenslab/fastTopics). We used the number of unique k -mers assigned to non-human genera from KrakenUniq as the observed count data for each sample, excluding genera with less than 50 unique k -mers assigned.…”
Section: Methodsmentioning
confidence: 99%
“…We used fastTopics to fit a topic model to the UMI counts, 33, 117 with K = 16 topics. fastTopics implements the following two-step approach to fit the topic model: (1) fit a non-negative matrix factorization based on a Poisson model (“Poisson NMF”); 118 (2) recover maximum-likelihood estimates (MLEs) of the topic model parameters by a simple reparameterization.…”
Section: Quantification and Statistical Analysismentioning
confidence: 99%