Mining Text Data 2012
DOI: 10.1007/978-1-4614-3223-4_5
|View full text |Cite
|
Sign up to set email alerts
|

Dimensionality Reduction and Topic Modeling: From Latent Semantic Indexing to Latent Dirichlet Allocation and Beyond

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
43
0
3

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 94 publications
(61 citation statements)
references
References 39 publications
0
43
0
3
Order By: Relevance
“…This is based on the assumptions that the importance of a word is proportional to the number of occurrences of this word in a document and inversely proportional to the number of documents in which the word occurred. The TF-IDF method can be used to find keywords from the text, but it fails to find a connection between semantically convergent documents that utilize different vocabularies, and more-complex methods need to be applied (such as Topic Modeling [7]). …”
Section: Methods Of Text Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…This is based on the assumptions that the importance of a word is proportional to the number of occurrences of this word in a document and inversely proportional to the number of documents in which the word occurred. The TF-IDF method can be used to find keywords from the text, but it fails to find a connection between semantically convergent documents that utilize different vocabularies, and more-complex methods need to be applied (such as Topic Modeling [7]). …”
Section: Methods Of Text Analysismentioning
confidence: 99%
“…Moreover, topic modeling can be used to reduce the representation of documents (e.g., using topics as features describing the text instead of all words). There are two main branches of Topic Modeling [7] -algorithms based on Singular Value Decomposition (such as Latent Semantic Indexing [13]) and those based on probabilistic generative processes [5] (such as Latent Dirichlet Allocation [3,4]). …”
Section: Methods Of Text Analysismentioning
confidence: 99%
“…The methods can group documents and words but cannot construct a knowledge structure as a topic model to allow the overlapping of documents and words in different topics. In Crain et al (2012), it combines Latent Semantic Indexing (LSI) and LDA for topic modeling. Its idea is to discuss the possibility of using these methods for large collection of text.…”
Section: Related Workmentioning
confidence: 99%
“…Statistical semantic models, such as latent semantic analysis (LSA) [3], probabilistic latent semantic analysis (PLSA) [4], and latent Dirichlet allocation (LDA) [5], are power tools for mining underlying topics in a document collection. PLSA is the probabilistic version of LSA, and both of them compute the similarities between documents with co-occurrences of terms, such as TF-IDF [6], but they ignore the order relationships between terms.…”
Section: Related Workmentioning
confidence: 99%
“…With these term frequency vectors, researchers could compute the similarities between documents and thus discover underlying topics in such a document collection [2]. Topic models can be classified into statistical semantic models [3][4][5][6][7][8][9][10] and embedded vector models [11][12][13]. While capturing the semantics of documents, statistical semantic model computes the similarities between documents with co-occurrence matrix of terms, and embedded vector model uses neighbor(s) to represent the meaning of a target term; however, both of them cannot describe the term orders in a document.…”
Section: Introductionmentioning
confidence: 99%