Proceedings of the Seventeenth Australasian Document Computing Symposium 2012
DOI: 10.1145/2407085.2407099
|View full text |Cite
|
Sign up to set email alerts
|

An English-translated parallel corpus for the CJK Wikipedia collections

Abstract: In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2012
2012
2012
2012

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 3 publications
0
1
0
Order By: Relevance
“…This work was on a broad range of document computing and information retrieval topics including: language identification [9]; analysis of parliamentary question time [21]; the release of a new corpus for cross-lingual information retrieval [19]; concept based information retrieval using subsumption relations [23]; the identification of entity-related attribute-value pairs in documents [5]; term dependencies in query expansion [18]; similarity of document signatures [22]; and public health information dissemination [17].…”
Section: Postersmentioning
confidence: 99%
“…This work was on a broad range of document computing and information retrieval topics including: language identification [9]; analysis of parliamentary question time [21]; the release of a new corpus for cross-lingual information retrieval [19]; concept based information retrieval using subsumption relations [23]; the identification of entity-related attribute-value pairs in documents [5]; term dependencies in query expansion [18]; similarity of document signatures [22]; and public health information dissemination [17].…”
Section: Postersmentioning
confidence: 99%