Non-negative matrix factorization algorithms greatly improve topic model fits

Carbonetto, Peter; Sarkar, Abhishek; Wang, Zihao; Stephens, Matthew

doi:10.48550/arxiv.2105.13440

Cited by 14 publications

(28 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The fact that the tumors in our dataset create a triangle-shaped continuum in latent space suggests that each tumor can be represented as a unique mixture of three “idealized” tumor components. Therefore, in order to provide a more quantitative interpretation, we fitted a topic model with k = 3 hidden “topics” to our dataset [ 22 ] (see Materials and Methods). This allowed us to infer both the three latent topics (that presumably represent the “idealized” tumor components) and also the proportions of topics from which every single tumor is composed.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Characterization of Continuous Transcriptional Heterogeneity in High-Risk Blastemal-Type Wilms’ Tumors Using Unsupervised Machine Learning

Trink

Urbach

Dekel

et al. 2023

IJMS

View full text Add to dashboard Cite

Wilms’ tumors are pediatric malignancies that are thought to arise from faulty kidney development. They contain a wide range of poorly differentiated cell states resembling various distorted developmental stages of the fetal kidney, and as a result, differ between patients in a continuous manner that is not well understood. Here, we used three computational approaches to characterize this continuous heterogeneity in high-risk blastemal-type Wilms’ tumors. Using Pareto task inference, we show that the tumors form a triangle-shaped continuum in latent space that is bounded by three tumor archetypes with “stromal”, “blastemal”, and “epithelial” characteristics, which resemble the un-induced mesenchyme, the cap mesenchyme, and early epithelial structures of the fetal kidney. By fitting a generative probabilistic “grade of membership” model, we show that each tumor can be represented as a unique mixture of three hidden “topics” with blastemal, stromal, and epithelial characteristics. Likewise, cellular deconvolution allows us to represent each tumor in the continuum as a unique combination of fetal kidney-like cell states. These results highlight the relationship between Wilms’ tumors and kidney development, and we anticipate that they will pave the way for more quantitative strategies for tumor stratification and classification.

show abstract

Section: Resultsmentioning

confidence: 99%

“…In this study, the parameters of the topic model were learned using the “fit_topic_model” function from the R package “fastTopics” [ 22 ] (version 0.4-11). The number of latent topics was set between k = 2, 3, …, 10.…”

Section: Methodsmentioning

confidence: 99%

Characterization of Continuous Transcriptional Heterogeneity in High-Risk Blastemal-Type Wilms’ Tumors Using Unsupervised Machine Learning

Trink

Urbach

Dekel

et al. 2023

IJMS

View full text Add to dashboard Cite

show abstract

“…We carried out topic model analysis on taxonomic classification profiles for each sample using the R package fastTopics 65 (https://github.com/stephenslab/fastTopics). We used the number of unique k -mers assigned to non-human genera from KrakenUniq as the observed count data for each sample, excluding genera with less than 50 unique k -mers assigned.…”

Section: Methodsmentioning

confidence: 99%

The landscape of ancient human pathogens in Eurasia from the Stone Age to historical times

Sikora,

Canteri,

Fernandez-Guerra

et al. 2023

Preprint

View full text Add to dashboard Cite

SummaryInfectious diseases have had devastating impacts on human populations throughout history. Still, the origins and past dynamics of human pathogens remain poorly understood1. To create the first spatiotemporal map of diverse ancient human microorganisms and parasites, we screened shotgun sequencing data from 1,313 ancient human remains covering 35,000 years of Eurasian history for ancient DNA deriving from bacteria, viruses, and parasites. We demonstrate the widespread presence of ancient microbial DNA in human remains, identifying over 2,400 individual species hits in 896 samples. We report a wide range of pathogens detected for the first time in ancient human remains, including the food-borne pathogensYersinia enterocoliticaandShigellaspp., the animal-borneLeptospira interrogans, and the malaria-causing parasitePlasmodium vivax. Our findings extend the spatiotemporal range of previously described ancient pathogens such asYersinia pestis, the causative agent of plague,Hepatitis B virus, andBorrelia recurrentis, the cause of louse-borne relapsing fever (LBRF). For LRBF we increase the known distribution from a single medieval genome to 31 cases across Eurasia covering 5,000 years. Grouping the ancient microbial species according to their type of transmission (zoonotic, anthroponotic, sapronotic, opportunistic, and other), we find that most categories are identified throughout the entire sample period, while zoonotic pathogens, which are transmitted from living animals to humans or which have made a host jump into humans from animals in the timeframe of this study, are only detected from ∼6,500 years ago. The incidence of zoonotic pathogens increased in our samples some 1,000 years later before reaching the highest detection rates ∼5,000 years ago, and was associated with a human genetic ancestry component characteristic of pastoralist populations from the Eurasian Steppe. Our results provide the first direct evidence for an epidemiological transition to an increased burden of zoonotic infectious diseases following the domestication of animals2. However, they also reveal that the spread of these pathogens first becomes frequent thousands of years after increased animal-human contact, likely coinciding with the pastoralist migrations from the Eurasian Steppe3,4. This study provides the first spatiotemporal map of past human pathogens using genomic paleoepidemiology, and the first direct evidence for an epidemiological transition of increased zoonotic infectious disease burden after the onset of agriculture, through historical times.

show abstract

“…We used fastTopics to fit a topic model to the UMI counts, 33, 117 with K = 16 topics. fastTopics implements the following two-step approach to fit the topic model: (1) fit a non-negative matrix factorization based on a Poisson model (“Poisson NMF”); 118 (2) recover maximum-likelihood estimates (MLEs) of the topic model parameters by a simple reparameterization.…”

Section: Quantification and Statistical Analysismentioning

confidence: 99%

Organism-Wide Analysis of Sepsis Reveals Mechanisms of Systemic Inflammation

Takahama

Patil²,

Johnson

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Sepsis is a systemic response to infection with life-threatening consequences. Our understanding of the impact of sepsis across organs of the body is rudimentary. Here, using mouse models of sepsis, we generate a dynamic, organism-wide map of the pathogenesis of the disease, revealing the spatiotemporal patterns of the effects of sepsis across tissues. These data revealed two interorgan mechanisms key in sepsis. First, we discover a simplifying principle in the systemic behavior of the cytokine network during sepsis, whereby a hierarchical cytokine circuit arising from the pairwise effects of TNF plus IL-18, IFN-γ, or IL-1β explains half of all the cellular effects of sepsis on 195 cell types across 9 organs. Second, we find that the secreted phospholipase PLA2G5 mediates hemolysis in blood, contributing to organ failure during sepsis. These results provide fundamental insights to help build a unifying mechanistic framework for the pathophysiological effects of sepsis on the body.

show abstract

Non-negative matrix factorization algorithms greatly improve topic model fits

Cited by 14 publications

References 21 publications

Characterization of Continuous Transcriptional Heterogeneity in High-Risk Blastemal-Type Wilms’ Tumors Using Unsupervised Machine Learning

Characterization of Continuous Transcriptional Heterogeneity in High-Risk Blastemal-Type Wilms’ Tumors Using Unsupervised Machine Learning

The landscape of ancient human pathogens in Eurasia from the Stone Age to historical times

Organism-Wide Analysis of Sepsis Reveals Mechanisms of Systemic Inflammation

Contact Info

Product

Resources

About