Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 2013
DOI: 10.1145/2484028.2484062
|View full text |Cite
|
Sign up to set email alerts
|

An unsupervised topic segmentation model incorporating word order

Abstract: We present a new unsupervised topic discovery model for a collection of text documents. In contrast to the majority of the state-of-the-art topic models, our model does not break the document's structure such as paragraphs and sentences. In addition, it preserves word order in the document. As a result, it can generate two levels of topics of different granularity, namely, segment-topics and word-topics. In addition, it can generate n-gram words in each topic. We also develop an approximate inference scheme us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
3
3
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 54 publications
(35 citation statements)
references
References 37 publications
0
35
0
Order By: Relevance
“…The approaches cannot distinguish between "Daniele loves Victoria" and "Victoria loves Daniele", but only represent the meaning as pertaining to love. However, there is research attempting to address this problem by incorporating word order into the model (e.g., see Jameel & Lam, 2013).…”
Section: Evaluation Of Word Count Strategies and Statistical Semanticsmentioning
confidence: 99%
“…The approaches cannot distinguish between "Daniele loves Victoria" and "Victoria loves Daniele", but only represent the meaning as pertaining to love. However, there is research attempting to address this problem by incorporating word order into the model (e.g., see Jameel & Lam, 2013).…”
Section: Evaluation Of Word Count Strategies and Statistical Semanticsmentioning
confidence: 99%
“…Performance of this method when dealing with large unstructured text data was not satisfactory and consumed significant amount of time to construct the matrix. A new model that performs phrase segmentation along with topic modeling was introduced [8] but failed to work with datasets having large number of files. TopMine [9], a system capable of mining topical phrase mining, is introduced recently, which implements a two step process for discovering phrases from text and for training a tradition topic model such as LDA.…”
Section: State Of the Art In Concept Extractionmentioning
confidence: 99%
“…For example, the topical n-gram model (TNG) introduced by Wang et al (2007) models unigram and n-gram phrases as mixture of topics based on the nearby word context. More recently, Jameel & Lam (2013) proposed an LDA extension that uses word sequence information to generate topic distribution over n-grams and performs topic segmentation using segment and paragraph information. While these and many other approaches offer a better and more realistic modeling of word sequences, they don't model topical variations across document sections either in mono-or multilingual collections.…”
Section: Introductionmentioning
confidence: 99%