2014
DOI: 10.1145/2518130
|View full text |Cite
|
Sign up to set email alerts
|

Statistical machine translation enhancements through linguistic levels

Abstract: Machine translation can be considered a highly interdisciplinary and multidisciplinary field because it is approached from the point of view of human translators, engineers, computer scientists, mathematicians and linguists. One of the most popular approaches is the statistical machine translation (SMT) approach which tries to cover translation in a holistic manner by learning from parallel corpus aligned at the sentence level. However, with this basic approach, there are some issues at each written linguistic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 27 publications
(10 citation statements)
references
References 85 publications
0
10
0
Order By: Relevance
“…The hybridization of machine translation systems in order to benefit from both statistical-based and linguistically-motivated approaches is becoming a popular trend in translation field. Such trend is well described in a number of surveys (Costa-Jussá and Farrús, 2014;Costa-Jussá and Fonollosa, 2015) and witnessed by recent initiatives in NLP community, such as the HyTra workshop series 1 . The motivations to this choice can be manifold, but essentially lie in the need to either reduce the costs -both in terms of time and resources -of building a fully rule-based system, or to integrate statistical models or SMT outputs with linguistic knowledge, as this could be useful to capture complex translation phenomena that data-driven approaches cannot handle properly.…”
Section: Introductionmentioning
confidence: 74%
“…The hybridization of machine translation systems in order to benefit from both statistical-based and linguistically-motivated approaches is becoming a popular trend in translation field. Such trend is well described in a number of surveys (Costa-Jussá and Farrús, 2014;Costa-Jussá and Fonollosa, 2015) and witnessed by recent initiatives in NLP community, such as the HyTra workshop series 1 . The motivations to this choice can be manifold, but essentially lie in the need to either reduce the costs -both in terms of time and resources -of building a fully rule-based system, or to integrate statistical models or SMT outputs with linguistic knowledge, as this could be useful to capture complex translation phenomena that data-driven approaches cannot handle properly.…”
Section: Introductionmentioning
confidence: 74%
“…In 1960 Bar-Hillel [4] stated that an MT system is not able to find the right meaning without a specific knowledge. Although the ambiguity problem has been lessened significantly since the contribution of Carpuat and subsequent works [181][182][183], this problem still remains a challenge. As seen in Section 3, MT systems still try to resolve this problem by using domain specific language models to prefer domain specific expressions, but when translating a highly ambiguous sentence or a short text which covers multiple domains, the languages models are not enough.…”
Section: Disambiguationmentioning
confidence: 99%
“…The lexical weighting features estimate the probability of a phrase pair word-by-word, which would suffer from sparseness issues under the low-resource scene. In this paper, we adopt Mongolian morphemes instead of words to estimate phrase pair, reducing the sparseness caused by rich morphology [3]. The morpheme-based weighting is defined at the morpheme level alignment, and the quality of alignment would be improved by morphemes [10].…”
Section: Motivationmentioning
confidence: 99%
“…Much of the work on SMT has shown that morphological segmentation could improve the SMT quality because of the sparseness reduction they contributed [3]. Some approach [4] presented the morph-ology in the factored translation model for Chinese-Mongolian SMT and attempted to resolve the problems of selecting word forms in the output sentences.…”
Section: Introductionmentioning
confidence: 99%