2020
DOI: 10.3390/cancers12123799
|View full text |Cite
|
Sign up to set email alerts
|

A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data

Abstract: Topic modeling is a widely used technique to extract relevant information from large arrays of data. The problem of finding a topic structure in a dataset was recently recognized to be analogous to the community detection problem in network theory. Leveraging on this analogy, a new class of topic modeling strategies has been introduced to overcome some of the limitations of classical methods. This paper applies these recent ideas to TCGA transcriptomic data on breast and lung cancer. The established cancer sub… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
43
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 16 publications
(45 citation statements)
references
References 58 publications
2
43
0
Order By: Relevance
“…While the above results were similar to the ones already discussed in [3] , the novelty of the present analysis is that we can perform a similar study also on the miRNA side. As we shall see this allows to have a new independent insight on the problem.…”
Section: Analysis Of Subtype Specific Topics Of Mirnassupporting
confidence: 90%
See 1 more Smart Citation
“…While the above results were similar to the ones already discussed in [3] , the novelty of the present analysis is that we can perform a similar study also on the miRNA side. As we shall see this allows to have a new independent insight on the problem.…”
Section: Analysis Of Subtype Specific Topics Of Mirnassupporting
confidence: 90%
“…Then, by subtracting to P(topic|subtype) the mean value over the whole dataset we find a new set of quantities, that we define as "centered" distributions (we already used them in [3] and it has the same meaning of the normalised value of the mixture proportion τ in [23])…”
Section: Construction Of Thep(topic|subtype) Distributionsmentioning
confidence: 99%
“…LDA was first introduced by Blei's study in 2003 (Hofmann, 1999 ; Blei et al, 2003 ). Scholars in cell and developmental biology have applied the LDA model to identify scientific research topics (Li et al, 2015 ; Valle et al, 2020 ). Besides, perplexity is considered as a standard tool to evaluate the effectiveness of various natural language processing models (Rosen-Zvi et al, 2010 ).…”
Section: Methodsmentioning
confidence: 99%
“…In cancer transcriptomics, ML and DL models have been applied to classify different cancer subtypes and cell populations [ 17 , 18 , 19 , 20 ], characterize tumor immune microenvironment [ 21 , 22 , 23 , 24 , 25 ], discover new prognostic biomarkers [ 26 , 27 , 28 ], assess and predict disease recurrence and patient survival [ 29 , 30 , 31 , 32 ], identify new putative actionable vulnerabilities [ 33 , 34 ], and predict tumor antigen immunogenicity [ 35 ] ( Figure 2 ).…”
Section: Ai In the Era Of Transcriptomic Big Datamentioning
confidence: 99%
“…Developed for natural language processing, this probabilistic clustering algorithm aims at discovering the hidden “topics” that reflect the biological heterogeneity and enhancing its comprehensive interpretation [ 109 ]. Applied to breast and lung cancer RNA-seq datasets, topic modeling outperformed standard clustering algorithms in identifying subtype-specific molecular features and their corresponding clinical outcomes [ 20 ].…”
Section: Ai Mining Of Cancer Transcriptomesmentioning
confidence: 99%