Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07 2007
DOI: 10.3115/1626355.1626388
|View full text |Cite
|
Sign up to set email alerts
|

Experiments in domain adaptation for statistical machine translation

Abstract: The special challenge of the WMT 2007 shared task was domain adaptation. We took this opportunity to experiment with various ways of adapting a statistical machine translation systems to a special domain (here: news commentary), when most of the training data is from a different domain (here: European Parliament speeches). This paper also gives a description of the submission of the University of Edinburgh to the shared task.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
153
0
5

Year Published

2011
2011
2018
2018

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 209 publications
(158 citation statements)
references
References 7 publications
0
153
0
5
Order By: Relevance
“…Some authors attempt to combine the predictions of two separate (in-domain and general-domain) translation models [16][17][18][19] or language models [20]. Wu and Wang [21] use in-domain data to improve word alignment in the training phase.…”
Section: Domain Adaptationmentioning
confidence: 99%
See 1 more Smart Citation
“…Some authors attempt to combine the predictions of two separate (in-domain and general-domain) translation models [16][17][18][19] or language models [20]. Wu and Wang [21] use in-domain data to improve word alignment in the training phase.…”
Section: Domain Adaptationmentioning
confidence: 99%
“…To make this information explicitly available, we follow the approach of Koehn and Schroeder [20] and train two independent translation models (phrase tables). The first experiment is based on the P5 configuration of parallel data: one phrase table is trained using in-domain sections of the training data (dictionaries and medical corpora) and the second using data selected from the general domain.…”
Section: Optimization Of Phrase Table Configurationmentioning
confidence: 99%
“…Other authors [24,3] studied different ways to combine bilingual or only source data from different domains. The use of clustering to extract sub-domains to build more specific language or translation models has also been studied in [54,43].…”
Section: Adaptationmentioning
confidence: 99%
“…Following the ideas in [1], one of the first works was performed in [7], where the authors added cache language and translation models to an interactive machine translation system. In [3], different ways to combine the available data belonging to two different sources were studied. The work in [8] explores alignment model mixtures as a way of performing topic adaptation.…”
Section: Related Workmentioning
confidence: 99%
“…Adaptation has become a very popular issue in natural language processing [1,2,3], and more specifically in statistical machine translation (SMT) [4]. Typically, the adaptation problem arises when two very different sets of training data are available, yielding two different sets of model parameters.…”
Section: Introductionmentioning
confidence: 99%