Dimensionality Reduction and Topic Modeling: From Latent Semantic Indexing to Latent Dirichlet Allocation and Beyond

Crain, Steven P.; Zhou, Ke; Yang, Shuang-Hong; Zha, Hongyuan

doi:10.1007/978-1-4614-3223-4_5

Cited by 94 publications

(61 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This is based on the assumptions that the importance of a word is proportional to the number of occurrences of this word in a document and inversely proportional to the number of documents in which the word occurred. The TF-IDF method can be used to find keywords from the text, but it fails to find a connection between semantically convergent documents that utilize different vocabularies, and more-complex methods need to be applied (such as Topic Modeling [7]). …”

Section: Methods Of Text Analysismentioning

confidence: 99%

“…Moreover, topic modeling can be used to reduce the representation of documents (e.g., using topics as features describing the text instead of all words). There are two main branches of Topic Modeling [7] -algorithms based on Singular Value Decomposition (such as Latent Semantic Indexing [13]) and those based on probabilistic generative processes [5] (such as Latent Dirichlet Allocation [3,4]). …”

Section: Methods Of Text Analysismentioning

confidence: 99%

See 1 more Smart Citation

Building sentiment lexicons based on recommending services for the Polish language

Gliwa

Zygmunt

Dąbrowski

2016

csci

View full text Add to dashboard Cite

Section: Methods Of Text Analysismentioning

confidence: 99%

Section: Methods Of Text Analysismentioning

confidence: 99%

Building sentiment lexicons based on recommending services for the Polish language

Gliwa

Zygmunt

Dąbrowski

2016

csci

View full text Add to dashboard Cite

“…The methods can group documents and words but cannot construct a knowledge structure as a topic model to allow the overlapping of documents and words in different topics. In Crain et al (2012), it combines Latent Semantic Indexing (LSI) and LDA for topic modeling. Its idea is to discuss the possibility of using these methods for large collection of text.…”

Section: Related Workmentioning

confidence: 99%

Hybrid linear matrix factorization for topic-coherent terms clustering

Liang

Wongthanavasu

2016

Expert Systems with Applications

View full text Add to dashboard Cite

“…Statistical semantic models, such as latent semantic analysis (LSA) [3], probabilistic latent semantic analysis (PLSA) [4], and latent Dirichlet allocation (LDA) [5], are power tools for mining underlying topics in a document collection. PLSA is the probabilistic version of LSA, and both of them compute the similarities between documents with co-occurrences of terms, such as TF-IDF [6], but they ignore the order relationships between terms.…”

Section: Related Workmentioning

confidence: 99%

“…With these term frequency vectors, researchers could compute the similarities between documents and thus discover underlying topics in such a document collection [2]. Topic models can be classified into statistical semantic models [3][4][5][6][7][8][9][10] and embedded vector models [11][12][13]. While capturing the semantics of documents, statistical semantic model computes the similarities between documents with co-occurrence matrix of terms, and embedded vector model uses neighbor(s) to represent the meaning of a target term; however, both of them cannot describe the term orders in a document.…”

Section: Introductionmentioning

confidence: 99%

Feedback recurrent neural network-based embedded vector and its application in topic model

Sheng-Jiang

Yin

2016

J Embedded Systems

View full text Add to dashboard Cite

While mining topics in a document collection, in order to capture the relationships between words and further improve the effectiveness of discovered topics, this paper proposed a feedback recurrent neural network-based topic model. We represented each word as a one-hot vector and embedded each document into a low-dimensional vector space. During the process of document embedding, we applied the long short-term memory method to capture the backward relationships between words and proposed a feedback recurrent neural network to capture the forward relationships between words. In the topic model, we used the original and muted document pairs as positive samples and the original and random document pairs as negative samples to train the model. The experiments show that the proposed model consumes not only lower running time and memory but also has better effectiveness during topic analysis.

show abstract

Dimensionality Reduction and Topic Modeling: From Latent Semantic Indexing to Latent Dirichlet Allocation and Beyond

Cited by 94 publications

References 39 publications

Building sentiment lexicons based on recommending services for the Polish language

Building sentiment lexicons based on recommending services for the Polish language

Hybrid linear matrix factorization for topic-coherent terms clustering

Feedback recurrent neural network-based embedded vector and its application in topic model

Contact Info

Product

Resources

About