Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08 2008
DOI: 10.3115/1613715.1613758
|View full text |Cite
|
Sign up to set email alerts
|

Sampling alignment structure under a Bayesian translation model

Abstract: We describe the first tractable Gibbs sampling procedure for estimating phrase pair frequencies under a probabilistic model of phrase alignment. We propose and evaluate two nonparametric priors that successfully avoid the degenerate behavior noted in previous work, where overly large phrases memorize the training data. Phrase table weights learned under our model yield an increase in BLEU score over the word-alignment based heuristic estimates used regularly in phrasebased translation systems.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(34 citation statements)
references
References 17 publications
0
34
0
Order By: Relevance
“…[13] have successfully applied a similar Bayesian technique to grammar induction and [14], [15] have developed a tractable Bayesian methods for the more complex task of bilingual phrase pair extraction for SMT, which involves reordering. [16] tackle the overfitting problem in phrasal alignment by using a leave-one-out approach using a strategy that despite being a different paradigm, shares many of the characteristics of our approach.…”
Section: Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…[13] have successfully applied a similar Bayesian technique to grammar induction and [14], [15] have developed a tractable Bayesian methods for the more complex task of bilingual phrase pair extraction for SMT, which involves reordering. [16] tackle the overfitting problem in phrasal alignment by using a leave-one-out approach using a strategy that despite being a different paradigm, shares many of the characteristics of our approach.…”
Section: Motivationmentioning
confidence: 99%
“…More sophisticated methods of defining the base measure are possible, for example [14], [15] use the IBM model 1 likelihood of one phrase conditioned on the other in the base model to encourage the formation of bilingual pairs that follow the word alignments in the corpus. This idea Input: Random initial corpus segmentation Output: Unsupervised co-segmentation of the corpus according to the model foreach iter=1 to NumIterations do foreach bilingual word-pair w ∈ randperm(W) do foreach co-segmentation γ i of w do Compute probability p(γ i |h) where h is the set of data (excluding w) and its hidden co-segmentation end Sample a co-segmentation γ i from the distribution p(γ i |h) Update counts end end Algorithm 1: The blocked Gibbs sampling algorithm.…”
Section: The Base Measurementioning
confidence: 99%
“…Most closely related is the work of DeNero et al (2008), who derive a Gibbs sampler for phrase-based alignment, using it to infer phrase translation probabilities.…”
Section: Related Workmentioning
confidence: 99%
“…Bayesian inference plus the Dirichlet Process (DP) have been shown to effectively prevent MT models from overfitting the training data (DeNero et al, 2008;Blunsom et al, 2008). A similar approach can be applied here for SSMT by considering each TTS template as a cluster, and using DP to adjust the number of TTS templates according to the training data.…”
Section: Bayesian Inference With the Dirichlet Process Priormentioning
confidence: 99%
“…Non-parametric Bayesian methods have been successfully applied to directly learn phrase pairs from a bilingual corpus with little or no dependence on word alignments (Blunsom et al, 2008;DeNero et al, 2008). Because such approaches directly learn a generative model over phrase pairs, they are theoretically preferable to the standard heuristics for extracting the phrase pairs from the many-to-one word-level alignments produced by the IBM series models (Brown et al, 1993) or the Hidden Markov Model (HMM) (Vogel et al, 1996).…”
Section: Introductionmentioning
confidence: 99%