2017
DOI: 10.1007/978-3-319-69365-1_14
|View full text |Cite
|
Sign up to set email alerts
|

Extending Feature Decay Algorithms Using Alignment Entropy

Abstract: In machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. Feature Decay Algorithms (FDA) have demonstrated excellent performance in a number of tasks. While the decay function is at the heart of the success of FDA, its parameters are initialised with the same weights. In this paper, we investigate the effect on Machine Translation of assigning more appropriate weights to words using word-alignment entropy. In experiments on German to English, we sh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
1
1

Relationship

4
1

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 9 publications
0
8
0
Order By: Relevance
“…The data sets used in the experiments are based on the ones used in the work of Biçici (2013) and Poncelas et al (2016): (i) Languages: German-to-English and Czechto-English; (ii) Training data: The training data provided in the WMT 2015 (Bojar et al, 2015) translation task setting a maximum sentence length of 126 words (4.5M sentence pairs, 225M words, in German-to-English corpus and 11M sentence pairs, 355M words, in Czech-to-English corpus); (iii) Tuning data: We use 5K randomly sampled sentences from development sets from previous years; (iv) Language Model: 8-gram Language Model (LM) built using the target-language side of the selected data via the KenLM toolkit (Heafield, 2011) using Kneser-Ney smoothing; (v) Selected sentences: Select 66.4 million words in total (source-and target-language sides) in each experiment; (vi) Test set: Documents provided in the WMT 2015 Translation Task.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The data sets used in the experiments are based on the ones used in the work of Biçici (2013) and Poncelas et al (2016): (i) Languages: German-to-English and Czechto-English; (ii) Training data: The training data provided in the WMT 2015 (Bojar et al, 2015) translation task setting a maximum sentence length of 126 words (4.5M sentence pairs, 225M words, in German-to-English corpus and 11M sentence pairs, 355M words, in Czech-to-English corpus); (iii) Tuning data: We use 5K randomly sampled sentences from development sets from previous years; (iv) Language Model: 8-gram Language Model (LM) built using the target-language side of the selected data via the KenLM toolkit (Heafield, 2011) using Kneser-Ney smoothing; (v) Selected sentences: Select 66.4 million words in total (source-and target-language sides) in each experiment; (vi) Test set: Documents provided in the WMT 2015 Translation Task.…”
Section: Methodsmentioning
confidence: 99%
“…In (Poncelas et al, 2016) experiments were carried out using unigrams as features and changing the parameter d in (1). The alignment probabilities were obtained by using FastAlign and GIZA++, showing that probabilities calculated by GIZA++ achieved better results.…”
Section: Alignment Entropy Of Unigram As Extension Of Fdamentioning
confidence: 99%
See 1 more Smart Citation
“…These values are by default (Biçici and Yuret, 2011) 0.5 and 0.0 for d and c, respectively (so, by using default values the decay function in Equation ( 4) is decay(f ) = init(f )0.5 C L (f ) ). There are alternative ways of setting the values (Poncelas et al, 2016(Poncelas et al, , 2017 that can obtain better results. However, in this work we used the default configuration of d = 0.5, c = 0.0 and used trigrams as features.…”
Section: Feature Decay Algorithmsmentioning
confidence: 99%
“…The method for building transductive models that we explore in our paper is Feature Decay Algorithms (FDA). Original FDA and its variants (Poncelas, Way, and Toral 2016;Poncelas, Maillette de Buy Wenniger, and Way 2017) are data selection techniques that use the information of the test set to select sentences from a parallel corpus used for training an MT model. Another characteristic of these methods is that they are context-dependent data selection techniques.…”
Section: Introductionmentioning
confidence: 99%