Interactive topic modeling

Hu, Yu; Boyd-Graber, Jordan; Satinoff, Brianna

doi:10.1007/s10994-013-5413-0

Cited by 253 publications

(191 citation statements)

References 31 publications

Supporting

Mentioning

189

Contrasting

Unclassified

Order By: Relevance

“…The generative process of nl-cLDA is as follows. It is essentially the same as (Hu et al, 2014) 1. For each topic k …”

Section: A Dataset Preprocessingmentioning

confidence: 99%

See 1 more Smart Citation

A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings

Hu¹,

Tsujii

2016

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

View full text Add to dashboard Cite

Uncovering thematic structures of SNS and blog posts is a crucial yet challenging task, because of the severe data sparsity induced by the short length of texts and diverse use of vocabulary. This hinders effective topic inference of traditional LDA because it infers topics based on document-level co-occurrence of words. To robustly infer topics in such contexts, we propose a latent concept topic model (LCTM). Unlike LDA, LCTM reveals topics via co-occurrence of latent concepts, which we introduce as latent variables to capture conceptual similarity of words. More specifically, LCTM models each topic as a distribution over the latent concepts, where each latent concept is a localized Gaussian distribution over the word embedding space. Since the number of unique concepts in a corpus is often much smaller than the number of unique words, LCTM is less susceptible to the data sparsity. Experiments on the 20Newsgroups show the effectiveness of LCTM in dealing with short texts as well as the capability of the model in handling held-out documents with a high degree of OOV words.

show abstract

“…The generative process of nl-cLDA is as follows. It is essentially the same as (Hu et al, 2014) 1. For each topic k …”

Section: A Dataset Preprocessingmentioning

confidence: 99%

“…• nI-cLDA, non-interactive constrained Latent Dirichlet Allocatoin, a variant of ITM (Hu et al, 2014), where constraints are inferred by applying k-means to external word embeddings. Each resulting word cluster is then regarded as a constraint.…”

Section: Datasets and Models Descriptionmentioning

confidence: 99%

A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings

Hu¹,

Tsujii

2016

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

View full text Add to dashboard Cite

show abstract

“…Our tool is complementary to this large body of work, and supports real-world deployment of these techniques. Interactive topic modeling (Hu et al, 2014) can play a key role to help users not only verify model consistency but actively curate high-quality codes; its inclusion is beyond the scope of a single conference paper. While supervised learning (Settles, 2011) has been applied to content analysis, it represents the application of a pre-defined coding scheme to a text corpus, which is different from the task of devising a coding scheme and assessing its reliability.…”

Section: Reproducibility Of a Coding Processmentioning

confidence: 99%

TopicCheck: Interactive Alignment for Assessing Topic Model Stability

Chuang¹,

Roberts

Stewart

et al. 2015

Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Content analysis, a widely-applied social science research method, is increasingly being supplemented by topic modeling. However, while the discourse on content analysis centers heavily on reproducibility, computer scientists often focus more on scalability and less on coding reliability, leading to growing skepticism on the usefulness of topic models for automated content analysis. In response, we introduce TopicCheck, an interactive tool for assessing topic model stability. Our contributions are threefold. First, from established guidelines on reproducible content analysis, we distill a set of design requirements on how to computationally assess the stability of an automated coding process. Second, we devise an interactive alignment algorithm for matching latent topics from multiple models, and enable sensitivity evaluation across a large number of models. Finally, we demonstrate that our tool enables social scientists to gain novel insights into three active research questions.

show abstract

“…Topic models like LDA rely on parameters that, while there are methods for doing so, can not easily be estimated through computation alone. Often, some emerging topics will be nonsensical to a human user [9]. Through interactivity, a topic model can be guided towards achieving more meaningful results.…”

Section: Topic Models and Interactivitymentioning

confidence: 99%

“…Here, the FOL constraints are similar to the must link and cannot link constraints of [3], but defined on word-pairs rather than documents. In some cases, real-time interactive knowledge injection has been applied, such as in [9], where the authors have used similar concepts as in [12] to create a framework allowing users to iteratively and interactively improve topic modeling results. While the work in [12] and [9] are general-purpose solutions, many of the specialised variations of LDA which incorporate domain knowledge are custom-built, single-purpose methods.…”

Section: A Human Knowledge Injection In Topic Modelsmentioning

confidence: 99%

A Survey On Interactivity in Topic Models

Kjellin¹,

Yan²

2016

ijacsa

View full text Add to dashboard Cite

Abstract-Trying to make sense and gain deeper insight from large sets of data is becoming a task very central to computer science in general. Topic models, capable of uncovering the semantic themes pervading through large collections of documents, have seen a surge in popularity in recent years. However, topic models are high level statistical tools; their output is given in terms of probability distributions, suited neither for simple interpretation nor deep analysis. Interpreting the fitted topic models in an intuitive manner requires visual and interactive tools. Additionally, some measure of human interaction is typically required for refining the output offered by such models. In the research, this area remains relatively unexplored -only recently has this aspect been receiving more attention. In this paper, the literature is surveyed as it pertains to interactivity and visualisation within the context of topic models, with the goal of finding current research trends in this area.

show abstract

Interactive topic modeling

Cited by 253 publications

References 31 publications

A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings

A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings

TopicCheck: Interactive Alignment for Assessing Topic Model Stability

A Survey On Interactivity in Topic Models

Contact Info

Product

Resources

About