2011
DOI: 10.1145/2002980.2002986
|View full text |Cite
|
Sign up to set email alerts
|

Using Sublexical Translations to Handle the OOV Problem in Machine Translation

Abstract: We introduce a method for learning to translate out-of-vocabulary (OOV) words. The method focuses on combining sublexical/constituent translations of an OOV to generate its translation candidates. In our approach, wildcard searches are formulated based on our OOV analysis, aimed at maximizing the probability of retrieving OOVs’ sublexical translations from existing resources of Machine Translation (MT) systems. At run-time, translation candidates of the unknown words are generated from their suitable sublexica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…In the decoding time, if a numeral or temporal expression is found, it is substituted by the special symbol so that the surrounding words can be handled properly and finally the numeral/temporal expression is translated with the manually written rules. Huang et al [2011] propose a sublexical translation method to translate Chinese abbreviations and compounds. More recently, Zhang et al [2012] address the problem of the lexical selection and word reordering of the surrounding words caused by unknown words.…”
Section: Lexismentioning
confidence: 99%
“…In the decoding time, if a numeral or temporal expression is found, it is substituted by the special symbol so that the surrounding words can be handled properly and finally the numeral/temporal expression is translated with the manually written rules. Huang et al [2011] propose a sublexical translation method to translate Chinese abbreviations and compounds. More recently, Zhang et al [2012] address the problem of the lexical selection and word reordering of the surrounding words caused by unknown words.…”
Section: Lexismentioning
confidence: 99%
“…Previous approaches to the handling of untranslated fragments include using a pivot language to translate the OOV word(s) into a third language and then back into to the source language, thereby extracting paraphrases to OOV (Callison-burch and Osborne, 2006), combining sub-lexical/constituent translations of the OOV word(s) to generate the translation (Huang et al, 2011) or finding paraphrases of the OOV words that have available translations (Marton et al, 2009;Razmara et al, 2013). 1 However the simplest approach to handle untranslated fragments is to increase the size of parallel data.…”
Section: Automatic Post-editorsmentioning
confidence: 99%
“…Untranslated out-of-vocabulary (OOV) words tend to degrade the accuracy of the output produced by an MT model. Huang (2010) pointed to various types of OOV words which occur in a data set -seg-mentation error in source language, named entities, combination forms (e.g. widebody) and abbreviations.…”
Section: Related Workmentioning
confidence: 99%