2014
DOI: 10.14778/2735508.2735519
|View full text |Cite
|
Sign up to set email alerts
|

Scalable topical phrase mining from text corpora

Abstract: While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work either performs post processing to the results of unigram-based topic models, or utilizes complex n-gramdiscovery topic models. These methods generally produce low-quality topical phrases or suffer from poor scalability on even moderately-sized datasets. We propose a di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
136
0
3

Year Published

2014
2014
2021
2021

Publication Types

Select...
3
3
3

Relationship

2
7

Authors

Journals

citations
Cited by 165 publications
(139 citation statements)
references
References 14 publications
0
136
0
3
Order By: Relevance
“…Experimental results show that this algorithm generates topics which are more interpretable than traditional LDA model. [9] -This algorithm first extracts phrases using a method similar to frequent pattern mining and then train a modified LDA model on the "bag-of phrases" input. This algorithm is found to be performing better than topical n-gram model (TNG) and a number of phrase discovering topic models such as phrase discovering topic model (PDLDA)…”
Section: Evaluation Of Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Experimental results show that this algorithm generates topics which are more interpretable than traditional LDA model. [9] -This algorithm first extracts phrases using a method similar to frequent pattern mining and then train a modified LDA model on the "bag-of phrases" input. This algorithm is found to be performing better than topical n-gram model (TNG) and a number of phrase discovering topic models such as phrase discovering topic model (PDLDA)…”
Section: Evaluation Of Resultsmentioning
confidence: 99%
“…A new model that performs phrase segmentation along with topic modeling was introduced [8] but failed to work with datasets having large number of files. TopMine [9], a system capable of mining topical phrase mining, is introduced recently, which implements a two step process for discovering phrases from text and for training a tradition topic model such as LDA. The assumption associated with this system, which says words in the same phrase must be assigned with the same topic, is not happening most of the time in practical scenario.…”
Section: State Of the Art In Concept Extractionmentioning
confidence: 99%
“…Statistical pattern mining [El-Kishky et al, 2015;Danilevsky et al, 2014; Supervised chunking trained from Penn Treebank Topic hierarchy / Taxonomy construction Combine statistical pattern mining with information networks [Wang et al, 2014] Lexical/Syntactic patterns (e.g., COLING2014 workshop on taxonomy construction) Entity Linking Graph alignment [Li et al, 2013] TAC-KBP Entity Linking methods and Wikification Relation discovery Hierarchical clustering [Wang et al, 2012] ACE relation extraction, bootstrapping Sentiment Analysis…”
Section: Nlp Methods Phrase Mining / Chunkingmentioning
confidence: 99%
“…We further apply n-gram testing techniques [30,10,11] to select from N the n-grams that have best word collocation for each n ≥ 2, in order to obtain more salient n-grams as candidates. Compared to NLP techniques such as chunking and dependency parsing, n-gram testing methods do not rely on any model training and are domain-independent.…”
Section: Candidate Generationmentioning
confidence: 99%