2018
DOI: 10.48550/arxiv.1808.03733
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Familia: A Configurable Topic Modeling Framework for Industrial Text Engineering

Abstract: In the last decade, a variety of topic models have been proposed for text engineering. However, except Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA), most of existing topic models are seldom applied or considered in industrial scenarios. This phenomenon is caused by the fact that there are very few convenient tools to support these topic models so far. Intimidated by the demanding expertise and labor of designing and implementing parameter inference algorithms, software en… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
3
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 28 publications
1
3
0
Order By: Relevance
“…Discussion of GDS: Since GDS is only utilized during the training process, we calculate the relevance score between each contextual utterance and the ground-truth response. Af- ter applying Familia [20] over the entire conversation, the relevance scores are 0.1502, 0.1388, 0.1602, 0.1548, 0.0979, 0.1343, and 0.1638 for X 1 to X 7 , which is consistent with humans' intuition. Besides, inspired by Zhang et al [6], we randomly sample 300 context-response pairs from JDDC.…”
Section: Resultssupporting
confidence: 74%
See 2 more Smart Citations
“…Discussion of GDS: Since GDS is only utilized during the training process, we calculate the relevance score between each contextual utterance and the ground-truth response. Af- ter applying Familia [20] over the entire conversation, the relevance scores are 0.1502, 0.1388, 0.1602, 0.1548, 0.0979, 0.1343, and 0.1638 for X 1 to X 7 , which is consistent with humans' intuition. Besides, inspired by Zhang et al [6], we randomly sample 300 context-response pairs from JDDC.…”
Section: Resultssupporting
confidence: 74%
“…The kappa value is 0.568, which indicates the moderate consistency among different annotators. We then pick out samples that is labeled the same by at least two annotators, and then calculate the kappa value between humans' judgement and the outputs from Familia [20] on these cases. The value 0.863 reflects "Substantial agreement" between them.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Despite the achieved progress, to conduct topic modeling for a new corpus, the state-of-the-art methods typically train a new topic model from scratch. Empirically a massive amount of in-domain data and considerable human labour is usually involved in obtaining a high-quality topic model [6]. Considering the great effort made in training a high-quality topic model, it is desirable to develop a framework that can take full advantages of previously welltrained topic models and transfer the knowledge in these models to the scenario of topic modeling on a new corpus, in order to save cost and improve effectiveness.…”
Section: Introductionmentioning
confidence: 99%