Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics on - EACL '09 2009
DOI: 10.3115/1609067.1609161
|View full text |Cite
|
Sign up to set email alerts
|

Feature-based method for document alignment in comparable news corpora

Abstract: In this paper, we present a feature-based method to align documents with similar content across two sets of bilingual comparable corpora from daily news texts. We evaluate the contribution of each individual feature and investigate the incorporation of these diverse statistical and heuristic features for the task of bilingual document alignment. Experimental results on the English-Chinese and English-Malay comparable news corpora show that our proposed Discrete Fourier Transformbased term frequency distributio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
16
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(16 citation statements)
references
References 12 publications
0
16
0
Order By: Relevance
“…Vu et al (2009) achieve a precision of 31.5% in English-Chinese corpora, and 63.4% in English-Malay corpora on Top-1 retrieval test.…”
Section: Related Workmentioning
confidence: 93%
See 2 more Smart Citations
“…Vu et al (2009) achieve a precision of 31.5% in English-Chinese corpora, and 63.4% in English-Malay corpora on Top-1 retrieval test.…”
Section: Related Workmentioning
confidence: 93%
“…Steinberger et al (2002) represent European document contents using descriptor terms of a multilingual thesaurus EUROVOC and measure the semantic similarity based on the distance between the two documents' representations. Vu et al (2009) use Discrete Fourier Transform (DFT) score of a word's frequency chain to measure time distribution similarity as a feature to evaluate the bilingual news similarity.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Talvensaari et al proposed a method to find similar documents based on translated words in the source language to the target language by using the main keywords [4]. Munteanu and Marcu [8] and Vu et al [9] found important words by using meta-information of documents and bilingual dictionary, and then they judged similar documents by frequency of words.…”
Section: Related Workmentioning
confidence: 99%
“…Through the person correlation coefficient to calculate word relevance, combining the term frequency and inverse document frequency calculation of source and target language text similarity. Vu filtered portions of the document according to news published time window and title & content, used the monolingual term distribution to calculate text similarity [12]. These methods use lexical translation to improve the quality of comparable corpus, but reduce the efficiency of the algorithm execution.…”
Section: Introductionmentioning
confidence: 99%