Proceedings of the 2nd ACM Workshop on Social Web Search and Mining 2009
DOI: 10.1145/1651437.1651447
|View full text |Cite
|
Sign up to set email alerts
|

Cross-language linking of news stories on the web using interlingual topic modelling

Abstract: We have studied the problem of linking event information across different languages without the use of translation systems or dictionaries. The linking is based on interlingua information obtained through probabilistic topic models trained on comparable corpora written in two languages (in our case English and Dutch). The achieve this, we expand the Latent Dirichlet Allocation model to process documents in two languages. We demonstrate the validity of the learned interlingual topics in a document clustering ta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
26
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 39 publications
(27 citation statements)
references
References 22 publications
1
26
0
Order By: Relevance
“…In order to tackle this issue, recent work relies on the supervision-lighter framework of multilingual probabilistic topic modeling (MuPTM) (Mimno, Wallach, Naradowsky, Smith, & McCallum, 2009;Boyd-Graber & Blei, 2009;De Smet & Moens, 2009;Ni, Sun, Hu, & Chen, 2009;Zhang, Mei, & Zhai, 2010;Fukumasu, Eguchi, & Xing, 2012) or other similar models for latent structure induction (Haghighi, Liang, Berg-Kirkpatrick, & Klein, 2008;Daumé III & Jagarlamudi, 2011).…”
Section: Bilingual Word Representations From Document-aligned Datamentioning
confidence: 99%
See 1 more Smart Citation
“…In order to tackle this issue, recent work relies on the supervision-lighter framework of multilingual probabilistic topic modeling (MuPTM) (Mimno, Wallach, Naradowsky, Smith, & McCallum, 2009;Boyd-Graber & Blei, 2009;De Smet & Moens, 2009;Ni, Sun, Hu, & Chen, 2009;Zhang, Mei, & Zhai, 2010;Fukumasu, Eguchi, & Xing, 2012) or other similar models for latent structure induction (Haghighi, Liang, Berg-Kirkpatrick, & Klein, 2008;Daumé III & Jagarlamudi, 2011).…”
Section: Bilingual Word Representations From Document-aligned Datamentioning
confidence: 99%
“…More recent work on multilingual probabilistic topic modeling (MuPTM) (Mimno et al, 2009;De Smet & Moens, 2009;Vulić et al, 2011) showed that word representations of higher quality may be built if a multilingual topic model such as bilingual LDA (BiLDA) is trained jointly on document-aligned comparable corpora by retaining the structure of the corpus intact (i.e., there is no need to construct pseudo-bilingual documents).…”
Section: Basic-muptmmentioning
confidence: 99%
“…Another potential solution to diversity is the use of topic models, such as latent Dirichlet allocation (LDA), where the similarity between two documents is measured via the similarity of the latent topics they share rather than by direct content comparison [3]. Recently, based on seminal work on multilingual topic modeling [22], multimodal extensions of LDA were proposed for cross-modal video hyperlinking [4], combining the potential for diversity offered by topic models and by multimodality. As for BiDNN, bag-of-words representations of words from automatic transcripts and of visual concepts in keyframes are used in bimodal LDA (BiLDA).…”
Section: Bidirectional Deep Neural Networkmentioning
confidence: 99%
“…As a result, documents are seen as mixtures of latent topics, while topics are probability distributions over words. The multimodal extension in [4] considers that each latent topic is defined by two probability distributions, one over each modality (or language in [22]). The BiLDA model is thus trained on parallel documents, assuming that the underlying topic distribution is common to the two modalities.…”
Section: Bidirectional Deep Neural Networkmentioning
confidence: 99%
“…We have in-house software that implements this model. We gloss over much of the details, but more information may be found in [29], [30].…”
mentioning
confidence: 99%