2014
DOI: 10.1007/s10590-014-9165-9
|View full text |Cite
|
Sign up to set email alerts
|

Quality estimation-guided supplementary data selection for domain adaptation of statistical machine translation

Abstract: Supplementary data selection is a strongly motivated approach in domain adaptation of statistical machine translation systems. In this paper we report a novel approach of data selection guided by automatic quality estimation. In contrast to the conventional approach of using the entire target-domain data as reference for data selection, we restrict the reference set only to sentences poorly translated by the baseline model. Automatic quality estimation is used to identify such poorly translated sentences in th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 14 publications
0
6
0
Order By: Relevance
“…This could be done by simply following the extension of the Moore-Lewis method to perform data selection from parallel data (Axelrod et al, 2011) or in combination with other methods which are deemed to be more suitable for parallel data, e.g. (Mansour et al, 2011;Banerjee et al, 2013).…”
Section: Discussionmentioning
confidence: 99%
“…This could be done by simply following the extension of the Moore-Lewis method to perform data selection from parallel data (Axelrod et al, 2011) or in combination with other methods which are deemed to be more suitable for parallel data, e.g. (Mansour et al, 2011;Banerjee et al, 2013).…”
Section: Discussionmentioning
confidence: 99%
“…The post-editing module of the framework (see also Roturier et al, (2013)) is designed to fulfil the project's objective of collecting post-editing data in order to learn correction rules and, through feedback loops, to integrate them into the SMT engines (with the goal of automating corrections whenever possible). The project relies on the participation of volunteer community members, who are subject matter experts, native speakers of the The post-editing text is organised in tasks belonging to post-editing projects.…”
Section: Post-editing Modulementioning
confidence: 99%
“…Koehn et al (2007) used multiple decoding paths for combining multiple domain-specific translation tables in the state-of-the-art PB-SMT decoder MOSES. Banerjee et al (2013) combined an in-domain model (translation and reordering model) with an out-of-domain model into MOSES and they derived log-linear features to distinguish between phrases of multiple domains by applying the data-source indicator features and showed modest improvement in translation quality. Bach et al (2008) suggested that sentences may be weighted by how much it matches with the target domain.…”
Section: Introductionmentioning
confidence: 99%