2006
DOI: 10.1007/s10590-007-9029-7
|View full text |Cite
|
Sign up to set email alerts
|

Finding translations for low-frequency words in comparable corpora

Abstract: Statistical methods to extract translational equivalents from non-parallel corpora hold the promise of ensuring the required coverage and domain customisation of lexicons as well as accelerating their compilation and maintenance. A challenge for these methods are rare, less common words and expressions, which often have low corpus frequencies. However, it is rare words such as newly introduced terminology and named entities that present the main interest for practical lexical acquisition. In this article, we s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 23 publications
0
11
0
Order By: Relevance
“…Some of these works include dictionary learning and identifying word translations (Rapp 1995;Fung and Yee 1998;Sadat et al 2003;Pekar et al 2006;Xabier et al 2008) finding translation equivalents (Bennison and Bowker 2000;Chiao and Zweigenbaum 2002;Sharoff et al 2006) named entity translation/transliteration (Huang et al 2005;Alegria et al 2006;Sproat and Zhai 2006), extracting phrasal alignments (Kumano et al 2007), mining name translations (Ji 2009), word sense disambiguation (Kaji 2003), parallel fragment extraction (Quirk et al 2007), cross language IR (CLIR) (Talvensaari 2008), extracting lay paraphrases of specialized expressions (Deléger and Zweigenbaum 2009), language and translation model adaptation (Hildebrand et al 2005;Snover et al 2008) and improving SMT performance using extracted parallel sentences (Munteanu and Marcu 2005;Schwenk 2009a, 2009b;Lu et al 2010). This article describes a method for exploiting comparable news corpora to produce more parallel texts and eventually improve SMT system performance.…”
Section: Introductionmentioning
confidence: 99%
“…Some of these works include dictionary learning and identifying word translations (Rapp 1995;Fung and Yee 1998;Sadat et al 2003;Pekar et al 2006;Xabier et al 2008) finding translation equivalents (Bennison and Bowker 2000;Chiao and Zweigenbaum 2002;Sharoff et al 2006) named entity translation/transliteration (Huang et al 2005;Alegria et al 2006;Sproat and Zhai 2006), extracting phrasal alignments (Kumano et al 2007), mining name translations (Ji 2009), word sense disambiguation (Kaji 2003), parallel fragment extraction (Quirk et al 2007), cross language IR (CLIR) (Talvensaari 2008), extracting lay paraphrases of specialized expressions (Deléger and Zweigenbaum 2009), language and translation model adaptation (Hildebrand et al 2005;Snover et al 2008) and improving SMT performance using extracted parallel sentences (Munteanu and Marcu 2005;Schwenk 2009a, 2009b;Lu et al 2010). This article describes a method for exploiting comparable news corpora to produce more parallel texts and eventually improve SMT system performance.…”
Section: Introductionmentioning
confidence: 99%
“…Most of the work in this line (Rapp 1999;Fung and McKeown 1997;Bouamor et al 2012)), including our own work (Pekar et al 2006), covers single words and not multiword expressions. According to the distributional similarity premise, translation equivalents share common words in their contexts and this applies also to multiword expressions.…”
Section: Translation Of Multiword Expresions: Methodology and Evaluationmentioning
confidence: 92%
“…Figure 6 shows that words that appear with higher frequency in our monolingual corpora tend to be translated better. Pekar et al (2006) also investigated the effects of frequency on finding translations from comparable copper. This makes sense since we have more robust statistics when constructing their vector representations.…”
Section: Learning Translations Of Unseen Wordsmentioning
confidence: 99%