Extending Feature Decay Algorithms Using Alignment Entropy

Poncelas, Alberto; Way, Andy; Toral, Antonio

doi:10.1007/978-3-319-69365-1_14

Cited by 8 publications

(8 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The data sets used in the experiments are based on the ones used in the work of Biçici (2013) and Poncelas et al (2016): (i) Languages: German-to-English and Czechto-English; (ii) Training data: The training data provided in the WMT 2015 (Bojar et al, 2015) translation task setting a maximum sentence length of 126 words (4.5M sentence pairs, 225M words, in German-to-English corpus and 11M sentence pairs, 355M words, in Czech-to-English corpus); (iii) Tuning data: We use 5K randomly sampled sentences from development sets from previous years; (iv) Language Model: 8-gram Language Model (LM) built using the target-language side of the selected data via the KenLM toolkit (Heafield, 2011) using Kneser-Ney smoothing; (v) Selected sentences: Select 66.4 million words in total (source-and target-language sides) in each experiment; (vi) Test set: Documents provided in the WMT 2015 Translation Task.…”

Section: Methodsmentioning

confidence: 99%

“…In (Poncelas et al, 2016) experiments were carried out using unigrams as features and changing the parameter d in (1). The alignment probabilities were obtained by using FastAlign and GIZA++, showing that probabilities calculated by GIZA++ achieved better results.…”

Section: Alignment Entropy Of Unigram As Extension Of Fdamentioning

confidence: 99%

“…There have been previous attempts to improve FDA by using alignment entropies for unigram features (Poncelas et al, 2016). This makes sentences containing specific unigrams more (or less) likely to be selected and thus different numbers of occurrences of those unigrams are obtained in the final training data.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms

Poncelas

Wenniger

Way

2017

The Prague Bulletin of Mathematical Linguistics

Self Cite

View full text Add to dashboard Cite

Data Selection is a popular step in Machine Translation pipelines. Feature Decay Algorithms (FDA) is a technique for data selection that has shown a good performance in several tasks. FDA aims to maximize the coverage of n-grams in the test set. However, intuitively, more ambiguous n-grams require more training examples in order to adequately estimate their translation probabilities. This ambiguity can be measured by alignment entropy. In this paper we propose two methods for calculating the alignment entropies for n-grams of any size, which can be used for improving the performance of FDA. We evaluate the substitution of the n-gramspecific entropy values computed by these methods to the parameters of both the exponential and linear decay factor of FDA. The experiments conducted on German-to-English and Czechto-English translation demonstrate that the use of alignment entropies can lead to an increase in the quality of the results of FDA.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Alignment Entropy Of Unigram As Extension Of Fdamentioning

confidence: 99%

See 1 more Smart Citation

Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms

Poncelas

Wenniger

Way

2017

The Prague Bulletin of Mathematical Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…These values are by default (Biçici and Yuret, 2011) 0.5 and 0.0 for d and c, respectively (so, by using default values the decay function in Equation ( 4) is decay(f ) = init(f )0.5 C L (f ) ). There are alternative ways of setting the values (Poncelas et al, 2016(Poncelas et al, , 2017 that can obtain better results. However, in this work we used the default configuration of d = 0.5, c = 0.0 and used trigrams as features.…”

Section: Feature Decay Algorithmsmentioning

confidence: 99%

Extracting In-domain Training Corpora for Neural Machine Translation Using Data Selection Methods

Silva¹,

Liu²,

Poncelas³

et al. 2018

Proceedings of the Third Conference on Machine Translation: Research Papers

Self Cite

View full text Add to dashboard Cite

Data selection is a process used in selecting a subset of parallel data for the training of machine translation (MT) systems, so that 1) resources for training might be reduced, 2) trained models could perform better than those trained with the whole corpus, and/or 3) trained models are more tailored to specific domains. It has been shown that for statistical MT (SMT), the use of data selection helps improve the MT performance significantly. In this study, we reviewed three data selection approaches for MT, namely Term Frequency-Inverse Document Frequency, Cross-Entropy Difference and Feature Decay Algorithm, and conducted experiments on Neural Machine Translation (NMT) with the selected data using the three approaches. The results showed that for NMT systems, using data selection also improved the performance, though the gain is not as much as for SMT systems.

show abstract

“…The method for building transductive models that we explore in our paper is Feature Decay Algorithms (FDA). Original FDA and its variants (Poncelas, Way, and Toral 2016;Poncelas, Maillette de Buy Wenniger, and Way 2017) are data selection techniques that use the information of the test set to select sentences from a parallel corpus used for training an MT model. Another characteristic of these methods is that they are context-dependent data selection techniques.…”

Section: Introductionmentioning

confidence: 99%

Improved feature decay algorithms for statistical machine translation

2020

Self Cite

View full text Add to dashboard Cite

In machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. In a scenario where the test set is accessible when the model is being built, training instances can be selected so they are the most relevant for the test set. Feature Decay Algorithms (FDA) are a technique for data selection that has demonstrated excellent performance in a number of tasks. This method maximizes the diversity of the n-grams in the training set by devaluing those ones that have already been included. We focus on this method to undertake deeper research on how to select better training data instances. We give an overview of FDA and propose improvements in terms of speed and quality. Using German-to-English parallel data, first we create a novel approach that decreases the execution time of FDA when multiple computation units are available. In addition, we obtain improvements on translation quality by extending FDA using information from the parallel corpus that is generally ignored.

show abstract

Extending Feature Decay Algorithms Using Alignment Entropy

Cited by 8 publications

References 9 publications

Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms

Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms

Extracting In-domain Training Corpora for Neural Machine Translation Using Data Selection Methods

Improved feature decay algorithms for statistical machine translation

Contact Info

Product

Resources

About