Proceedings of the Ninth Workshop on Statistical Machine Translation 2014
DOI: 10.3115/v1/w14-3323
|View full text |Cite
|
Sign up to set email alerts
|

Manawi: Using Multi-Word Expressions and Named Entities to Improve Machine Translation

Abstract: We describe the Manawi 1 ( ) system submitted to the 2014 WMT translation shared task. We participated in the English-Hindi (EN-HI) and Hindi-English (HI-EN) language pair and achieved 0.792 for the Translation Error Rate (TER) score 2 for EN-HI, the lowest among the competing systems. Our main innovations are (i) the usage of outputs from NLP tools, viz. billingual multi-word expression extractor and named-entity recognizer to improve SMT quality and (ii) the introduction of a novel filter method based on sen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 21 publications
(19 citation statements)
references
References 8 publications
0
19
0
Order By: Relevance
“…University of Stuttgart / University of Munich (Quernheim and Cap, 2014) (Do et al, 2014) MANAWI-* Universität des Saarlandes (Tan and Pal, 2014) MATRAN Abu-MaTran Project: Prompsit / DCU / UA (Rubino et al, 2014) PROMT-RULE, PROMT-HYBRID PROMT RWTH RWTH Aachen STANFORD Stanford University (Neidert et al, 2014;Green et al, 2014) UA-* University of Alicante UEDIN-PHRASE, UEDIN-UNCNSTR University of Edinburgh (Durrani et al, 2014b) UEDIN-SYNTAX University of Edinburgh UU, UU-DOCENT Uppsala University (Hardmeier et al, 2014) Y-SDA Yandex School of Data Analysis (Borisov and Galinskaya, 2014) COMMERCIAL- [1,2] Two commercial machine translation systems ONLINE-[A,B,C,G] Four online statistical machine translation systems 4] Two rule-based statistical machine translation systems Table 2: Participants in the shared translation task. Not all teams participated in all language pairs.…”
Section: Ims-tttmentioning
confidence: 99%
“…University of Stuttgart / University of Munich (Quernheim and Cap, 2014) (Do et al, 2014) MANAWI-* Universität des Saarlandes (Tan and Pal, 2014) MATRAN Abu-MaTran Project: Prompsit / DCU / UA (Rubino et al, 2014) PROMT-RULE, PROMT-HYBRID PROMT RWTH RWTH Aachen STANFORD Stanford University (Neidert et al, 2014;Green et al, 2014) UA-* University of Alicante UEDIN-PHRASE, UEDIN-UNCNSTR University of Edinburgh (Durrani et al, 2014b) UEDIN-SYNTAX University of Edinburgh UU, UU-DOCENT Uppsala University (Hardmeier et al, 2014) Y-SDA Yandex School of Data Analysis (Borisov and Galinskaya, 2014) COMMERCIAL- [1,2] Two commercial machine translation systems ONLINE-[A,B,C,G] Four online statistical machine translation systems 4] Two rule-based statistical machine translation systems Table 2: Participants in the shared translation task. Not all teams participated in all language pairs.…”
Section: Ims-tttmentioning
confidence: 99%
“…Section 11.3 presents the research carried out in the EXPERT project that incorporates information from a paraphrase database into matching and retrieval from translation memories, and shows how this can improve the productivity of professional translators, and how to deal with large translation memories. In the same vein of research, Tan and Pal [2014] proposed several methods for terminology extraction and ontology induction with the aim of integrating them in translation memories and statistical machine translation.…”
Section: Language Technology In Translation Memorymentioning
confidence: 99%
“…The same method was also applied to the monolingual data. Successively, the corpus cleaning process was carried out first by calculating the global mean ratio of the number of characters in a source sentence to that in the corresponding target sentence and then filtering out sentence pairs that exceed or fall below 20% of the global ratio (Tan and Pal, 2014). Tokenization and punctuation normalization were performed using Moses scripts.…”
Section: Datasetsmentioning
confidence: 99%