Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09 2009
DOI: 10.3115/1626431.1626458
|View full text |Cite
|
Sign up to set email alerts
|

SMT and SPE machine translation systems for WMT'09

Abstract: This paper describes the development of several machine translation systems for the 2009 WMT shared task evaluation. We only consider the translation between French and English. We describe a statistical system based on the Moses decoder and a statistical post-editing system using SYSTRAN's rule-based system. We also investigated techniques to automatically extract additional bilingual texts from comparable corpora.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 15 publications
0
11
0
Order By: Relevance
“…2 The video recordings, audio track, metadata, synopsis, cast, detected shots, detected faces, visual concepts, subtitles, and two automatic transcripts, provided by LIMSI [15] and LIUM [23], are available together with the recordings themselves.…”
Section: Search and Hyperlinking Datamentioning
confidence: 99%
“…2 The video recordings, audio track, metadata, synopsis, cast, detected shots, detected faces, visual concepts, subtitles, and two automatic transcripts, provided by LIMSI [15] and LIUM [23], are available together with the recordings themselves.…”
Section: Search and Hyperlinking Datamentioning
confidence: 99%
“…One of the simplest strategies, which has already been put into practice with the Apertium bilingual dictionaries (Tyers 2009;Sanchez-Cartagena and Pérez-Ortiz 2010), consists of adding the dictionary entries directly to the parallel corpus. In addition to the obvious increase in lexical coverage, Schwenk et al (2009) state that the quality of the alignments obtained is also improved when the words in the bilingual dictionary appear in other sentences of the parallel corpus. However, it is not guaranteed that, following this strategy, multi-word expressions from the bilingual dictionary that appear in the SL sentences are translated as such because they may be split into smaller units by the phrase-extraction algorithm.…”
Section: Hybrid and Rule-based Mt Systemsmentioning
confidence: 99%
“…We selected TextTiling for its robustness and simplicity, although more advanced techniques such as TopicTiling [24] (same core algorithm but with LDA topic modeling) are also available and could be tested in the future. Table 1 shows the total number of segments, the average segment size (in seconds) and the standard deviation STD (in seconds) for each of the three alternative transcript types of the BBC TV shows available for segmentation: subtitles vs. ASR from LIMSI [11] vs. ASR from LIUM [27]. The longer size of the LIUM-based segments and the larger variability of subtitle-based segments should be noted.…”
Section: Topic Segmentationmentioning
confidence: 99%
“…We generate the data units, namely topic-based segments, from the human-made subtitles provided by the BBC or from the ASR transcripts. We experimented with transcripts from LIMSI [11] and from LIUM [27], both kindly provided to the MediaEval participants. The topic segmentation was performed over the words using the TextTiling algorithm implemented in NLTK.…”
Section: System Overviewmentioning
confidence: 99%