Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007) 2007
DOI: 10.1109/icdmw.2007.39
|View full text |Cite
|
Sign up to set email alerts
|

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models

Abstract: Statistical topic models such as the Latent Dirichlet Allocation (LDA)

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2010
2010
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 13 publications
0
5
0
Order By: Relevance
“…With respect to topic models estimated with bigrams [MacKay and Peto, 1995, Wallach, 2006, Wang et al, 2007, Yan et al, 2013, Nallapati et al, 2007, one of the advantages of LDA2Net is that its results do not depend on any assumption on the distribution family and data generation process of bigrams, being LDA2Net based on the observed frequencies of bigrams in documents, which are combined (a-posteriori) with LDA output matrices, as shown in Figure 1.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…With respect to topic models estimated with bigrams [MacKay and Peto, 1995, Wallach, 2006, Wang et al, 2007, Yan et al, 2013, Nallapati et al, 2007, one of the advantages of LDA2Net is that its results do not depend on any assumption on the distribution family and data generation process of bigrams, being LDA2Net based on the observed frequencies of bigrams in documents, which are combined (a-posteriori) with LDA output matrices, as shown in Figure 1.…”
Section: Discussionmentioning
confidence: 99%
“…Attempts to include and model syntagmatic information (i.e. information concerning sequential relations between words) in topic models have already been investigated in MacKay and Peto [1995], Wallach [2006], Wang et al [2007], Yan et al [2013], Nallapati et al [2007].…”
Section: Introductionmentioning
confidence: 99%
“…Attempts to include and model syntagmatic information (i.e. information concerning sequential relations between words) in topic models have already been investigated in [11][12][13][14][15] as well. Moreover, using a Dirichlet distribution implicitly assumes that few items overwhelmingly contribute to each topic.…”
Section: Related Workmentioning
confidence: 99%
“…Concerning topic models estimated with bigrams [11][12][13][14][15], one of the advantages of LDA2Net is that its results do not depend on any assumption on the distribution family and data generation process of bigrams, being LDA2Net based on the observed frequencies of bigrams in documents, which are combined (a-posteriori) with LDA output matrices, as shown in Fig 1.…”
Section: Related Workmentioning
confidence: 99%
“…In any language, there exists a semantic structure among words, which leads to the correlation of words in constituting the meaning in an expression, and more specifically, the correlation of their roles in a natural function β. This structure has been studied as the semantic correlation of words [17,29,16] in a machine learning context. Given a correlation structure on model coefficients, assuming model sparsity and imposing an ℓ1-norm penalty are no longer appropriate.…”
Section: Model Compression: Correlationmentioning
confidence: 99%