2014 IEEE Conference on Computer Vision and Pattern Recognition 2014
DOI: 10.1109/cvpr.2014.178
|View full text |Cite
|
Sign up to set email alerts
|

Topic Modeling of Multimodal Data: An Autoregressive Approach

Abstract: Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. Specifically, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
49
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 58 publications
(49 citation statements)
references
References 9 publications
0
49
0
Order By: Relevance
“…The proposed method is compared on image classification with the following five methods: sLDA-ann [32], abc-corr-LDA [24], SupDocNADE [38], SAGE [12] and MedSTC [37]. Although SAGE and MedSTC only consider image visual information, they add constraints to topic model for obtaining sparse topics, we compare their results for image classification.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The proposed method is compared on image classification with the following five methods: sLDA-ann [32], abc-corr-LDA [24], SupDocNADE [38], SAGE [12] and MedSTC [37]. Although SAGE and MedSTC only consider image visual information, they add constraints to topic model for obtaining sparse topics, we compare their results for image classification.…”
Section: Methodsmentioning
confidence: 99%
“…Zheng et al [38] proposed the supervised document neural autoregressive distribution estimator (SupDocNADE) model to simultaneously deal with the image classification and annotation tasks. It obtains the hidden topic features by using the neural network on the mixed representation with all visual and annotation words, and learns the connection between hidden layer and the class label.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We then extracted visual words using a k-means algorithm, resulting in 1,000 visual word types. For the UIUC-Sports dataset, we used the same settings as used by Zheng et al [12]. We set the grid size to 8×8 pixels and set the scale to 16 pixels.…”
Section: Experimental Settingsmentioning
confidence: 99%
“…UIUC-Sports consists of 8 classes about sports. We used the dataset used by Zheng et al [12], where the number of images for each class ranges from 137 (bocce) to 330 (croquet), and the total number of images is 1,792.…”
Section: Datasetsmentioning
confidence: 99%