Proceedings of the Fourth ACM International Conference on Web Search and Data Mining 2011
DOI: 10.1145/1935826.1935887
|View full text |Cite
|
Sign up to set email alerts
|

Cross lingual text classification by mining multilingual topics from wikipedia

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
44
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 48 publications
(45 citation statements)
references
References 17 publications
1
44
0
Order By: Relevance
“…Bilingual word representations could serve as an useful source knowledge for problems in cross-lingual information retrieval (Levow, Oard, & Resnik, 2005;Vulić, De Smet, & Moens, 2013), statistical machine translation (Wu, Wang, & Zong, 2008), document classification (Ni, Sun, Hu, & Chen, 2011;Klementiev et al, 2012;Hermann & Blunsom, 2014b;Chandar, Lauly, Larochelle, Khapra, Ravindran, Raykar, & Saha, 2014;Vulić, De Smet, Tang, & Moens, 2015), bilingual lexicon extraction (Tamura, Watanabe, & Sumita, 2012;Vulić & Moens, 2013a), or knowledge transfer and annotation projection from resource-rich to resource-poor languages for a myriad of NLP tasks such as dependency parsing, POS tagging, semantic role labeling or selectional preferences (Yarowsky & Ngai, 2001;Padó & Lapata, 2009;Peirsman & Padó, 2010;Das & Petrov, 2011;Täckström, Das, Petrov, McDonald, & Nivre, 2013;Ganchev & Das, 2013;Tiedemann, Agić, & Nivre, 2014;Xiao & Guo, 2014). Other interesting application domains are machine translation (e.g., Zou, Socher, Cer, & Manning, 2013;Wu, Dong, Hu, Yu, He, Wu, Wang, & Liu, 2014;Zhang, Liu, Li, Zhou, & Zong, 2014) and cross-lingual information retrieval (e.g., .…”
Section: Bilingual Word Embeddingsmentioning
confidence: 99%
“…Bilingual word representations could serve as an useful source knowledge for problems in cross-lingual information retrieval (Levow, Oard, & Resnik, 2005;Vulić, De Smet, & Moens, 2013), statistical machine translation (Wu, Wang, & Zong, 2008), document classification (Ni, Sun, Hu, & Chen, 2011;Klementiev et al, 2012;Hermann & Blunsom, 2014b;Chandar, Lauly, Larochelle, Khapra, Ravindran, Raykar, & Saha, 2014;Vulić, De Smet, Tang, & Moens, 2015), bilingual lexicon extraction (Tamura, Watanabe, & Sumita, 2012;Vulić & Moens, 2013a), or knowledge transfer and annotation projection from resource-rich to resource-poor languages for a myriad of NLP tasks such as dependency parsing, POS tagging, semantic role labeling or selectional preferences (Yarowsky & Ngai, 2001;Padó & Lapata, 2009;Peirsman & Padó, 2010;Das & Petrov, 2011;Täckström, Das, Petrov, McDonald, & Nivre, 2013;Ganchev & Das, 2013;Tiedemann, Agić, & Nivre, 2014;Xiao & Guo, 2014). Other interesting application domains are machine translation (e.g., Zou, Socher, Cer, & Manning, 2013;Wu, Dong, Hu, Yu, He, Wu, Wang, & Liu, 2014;Zhang, Liu, Li, Zhou, & Zong, 2014) and cross-lingual information retrieval (e.g., .…”
Section: Bilingual Word Embeddingsmentioning
confidence: 99%
“…For instance, the multilingual thesaurus EUROVOC (created by the European Commission's Publications Office) was used in [18] for document similarity purposes; however, EUROVOC utilizes less than 6 000 descriptors, which leads to evident limits in semantic coverage. Furthermore, other knowledge bases such as EuroWordNet [20] only utilize lexicographic information, while conversely studies that focus on Wikipedia (e.g., [16,2]) cannot profitably leverage on lexical ontology knowledge.…”
Section: Background On Babelnetmentioning
confidence: 99%
“…This is partly explained by the increased popularity of tools for collaboratively editing through contributors across the world, which eases the production of different language-written documents, leading to a new phenomenon of multilingual information overload. Analyzing multilingual document collections is getting increased attention as it can support a variety of tasks, such as building translation resources [20,14], detection of plagiarism in patent collections [1], cross-lingual document similarity and multilingual document classification [18,16,6,2,5].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…These translations are usually obtained through machine translation techniques based on a selected anchor language. Conversely, a comparable corpus is a collection of multilingual documents written over the same set of classes (Ni et al, 2011;Yogatama and Tanaka-Ishii, 2009) without any restriction about translation or perfect correspondence between documents. To mine this kind of corpus, external knowledge is employed to map concepts or terms from a language to another (Kumar et al, 2011c;Kumar et al, 2011a), which enables the extraction of crosslingual document correlations.…”
Section: Introductionmentioning
confidence: 99%