2015
DOI: 10.1007/s10791-015-9254-2
|View full text |Cite
|
Sign up to set email alerts
|

Supervised topic models with word order structure for document classification and retrieval learning

Abstract: One limitation of most existing probabilistic latent topic models for document classification is that the topic model itself does not consider useful side-information, namely, class labels of documents. Topic models, which in turn consider the side-information, popularly known as supervised topic models, do not consider the word order structure in documents. One of the motivations behind considering the word order structure is to capture the semantic fabric of the document. We investigate a low-dimensional lat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 100 publications
(110 reference statements)
0
8
0
Order By: Relevance
“…We have adopted the same preprocessing strategy as for the categorization task, with the exception of OHSUMED, for which suitable LTR features are already given. For all other datasets we used the Terrier LTR framework 23 to generate the six standard LTR document features as described in (Jameel et al, 2015). The document vectors were then concatenated with these six features 24 .…”
Section: Document Embedding Resultsmentioning
confidence: 99%
“…We have adopted the same preprocessing strategy as for the categorization task, with the exception of OHSUMED, for which suitable LTR features are already given. For all other datasets we used the Terrier LTR framework 23 to generate the six standard LTR document features as described in (Jameel et al, 2015). The document vectors were then concatenated with these six features 24 .…”
Section: Document Embedding Resultsmentioning
confidence: 99%
“…Based on the frequency of occurrence of the index term, this scheme attempts to find the most likely category into which new documents are supposed to be categorized. [6][7] As for the representative methods in the statistic document classification technique, there are two main methods: (1) the method to use Bayesian probability and (2) the method to use vector similarity. In the Bayesian probability based method, probabilities that documents are classified to each category are estimated whenever the index terms extracted from an arbitrary document appear.…”
Section: Related Studiesmentioning
confidence: 99%
“…We learn the parameters of the model using the training data (75%), and report the perplexity results on the held-out data (25%). For the parametric topic models, we use a tuning set to determine the number of topics following the tuning procedure described in [13]. Our objective is to compare how well our model has learned all parameters and how it performs in terms of its generalization ability.…”
Section: Lifestyle Pattern Quality Evaluationmentioning
confidence: 99%