Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 1 - EMNLP '09 2009
DOI: 10.3115/1699510.1699560
|View full text |Cite
|
Sign up to set email alerts
|

Improved statistical machine translation using monolingually-derived paraphrases

Abstract: Untranslated words still constitute a major problem for Statistical Machine Translation (SMT), and current SMT systems are limited by the quantity of parallel training texts. Augmenting the training data with paraphrases generated by pivoting through other languages alleviates this problem, especially for the so-called "low density" languages. But pivoting requires additional parallel texts. We address this problem by deriving paraphrases monolingually, using distributional semantic similarity measures, thus p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
66
0
1

Year Published

2010
2010
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 79 publications
(68 citation statements)
references
References 28 publications
(42 reference statements)
1
66
0
1
Order By: Relevance
“…18 Our framework also makes it possible to compare human paraphrases with those obtained by automatic methods (e.g., Bannard and Callison-Burch [2005], Callison-Burch et al [2006], Callison-Burch [2008], and Marton et al [2009]) on a potentially large scale, which may help improve both our own collaborative translation process and also the state-ofthe-art in automatic paraphrasing. More generally, any component in Figure 2 that is represented by a rectangle can be a task for either humans or machines, which means that any such component can serve both as a source of data for evaluation and development of automated methods and as a testbed for those methods.…”
Section: Discussionmentioning
confidence: 99%
“…18 Our framework also makes it possible to compare human paraphrases with those obtained by automatic methods (e.g., Bannard and Callison-Burch [2005], Callison-Burch et al [2006], Callison-Burch [2008], and Marton et al [2009]) on a potentially large scale, which may help improve both our own collaborative translation process and also the state-ofthe-art in automatic paraphrasing. More generally, any component in Figure 2 that is represented by a rectangle can be a task for either humans or machines, which means that any such component can serve both as a source of data for evaluation and development of automated methods and as a testbed for those methods.…”
Section: Discussionmentioning
confidence: 99%
“…There are many ongoing attempts to develop MT systems for Indian languages (Antony, 2013;Kunchukuttan et al, 2014;Sreelekha et al, 2014;Sreelekha et al, 2015) using both rule based and statistical approaches. There were many attempts to improve the quality of Statistical MT systems such as; using Monolingually-Derived Paraphrases (Marton et al, 2009), using Related Resource-Rich languages (Nakov and Ng, 2012). Considering the large amount of human effort and linguistic knowledge required for developing rule based systems, statistical MT systems became a better choice in terms of efficiency.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, many approaches to augment parallel sentences using paraphrasing have been proposed [5], [6]. Moreover, Callison-Burch et al [7], [8] proposed a method which augments a translation phrase table using paraphrases which are automatically acquired from parallel corpora [9]. However, there is no work which augments input sentences by paraphrasing and representing these paraphrases in lattices.…”
Section: Related Workmentioning
confidence: 99%