2010
DOI: 10.5715/jnlp.17.3_101
|View full text |Cite
|
Sign up to set email alerts
|

Paraphrasing Training Data for Statistical Machine Translation

Abstract: Large amounts of data are essential for training statistical machine translation systems. In this paper we show how training data can be expanded by paraphrasing one side of a parallel corpus. The new data is made by parsing then generating using an open-source, precise HPSG-based grammar. This gives sentences with the same meaning, but with minor variations in lexical choice and word order. In experiments paraphrasing the English in the Tanaka Corpus, a freely-available Japanese-English parallel corpus, we sh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…This method was done by just using the paraphrases as a means for data augmentation on the source side, such as reported by Nichols et al (2010), to leverage SMT systems [19]. All of the paraphrases and their original sentence were combined, and the target sentence was duplicated by the number of multiple paraphrases.…”
Section: Combining All Data In a Single Modelmentioning
confidence: 99%
“…This method was done by just using the paraphrases as a means for data augmentation on the source side, such as reported by Nichols et al (2010), to leverage SMT systems [19]. All of the paraphrases and their original sentence were combined, and the target sentence was duplicated by the number of multiple paraphrases.…”
Section: Combining All Data In a Single Modelmentioning
confidence: 99%
“…Relevant studies have also confirmed that the use of CL rules can have a positive impact on MT output (Aikawa et al 2007; Bernth 1998; Mitamura and Nyberg 1995; Mitamura 1999). One of the objectives of this study is to summarize a set of language control rules, which can be used to retell or edit the source language sentences before translation, reduce ambiguity and improve the quality of machine translation [3,4,5]. The steps of designing controlled language rules are as follows:…”
Section: Controlling Languagementioning
confidence: 99%
“…As shown in Fig 2, the multiple pivot method is used to obtain the candidate paraphrase process, which is assumed to be composed of N translation engines, and assumes that each engine can handle M languages. The system can input any source language fragment S through any of the N inputs, and then obtain the representation of the target language fragment T from any of the N output engines [9,10]. The multiple pivot based paraphrase generation method has the following two advantages.…”
Section: Retrieval Of Rehearsal Resources Based On Multimentioning
confidence: 99%