The focus of our workshop was to use parallel corpora for machine translation. Recent experimentation has shown that the performance of SMT systems varies greatly with the source language. In this workshop we encouraged researchers to investigate ways to improve the performance of SMT systems for diverse languages, including morphologically more complex languages, languages with partial free word order, and low-resource languages.Prior to the workshop, in addition to soliciting relevant papers for review and possible presentation, we conducted four shared tasks: a general translation task, a medical translation task, a quality estimation task, and a task to test automatic evaluation metrics. The medical translation task was introduced this year to address the important issue of domain adaptation within SMT. The results of the shared tasks were announced at the workshop, and these proceedings also include an overview paper for the shared tasks that summarizes the results, as well as provides information about the data used and any procedures that were followed in conducting or scoring the task. In addition, there are short papers from each participating team that describe their underlying system in greater detail.Like in previous years, we have received a far larger number of submission than we could accept for presentation. This year we have received 27 full paper submissions and 49 shared task submissions. In total WMT 2014 featured 12 full paper oral presentations and 49 shared task poster presentations.The invited talk was given by Alon Lavie (Carnegie Mellon University and Safaba Translation Solutions, Inc.) entitled "Machine Translation in Academia and in the Commercial World -a Contrastive Perspective".We would like to thank the members of the Program Committee for their timely reviews. We also would like to thank the participants of the shared task and all the other volunteers who helped with the evaluations. is a ranking of the systems that participated in its shared translation tasks, produced by aggregating pairwise sentencelevel comparisons collected from human judges. Over the past few years, there have been a number of tweaks to the aggregation formula in attempts to address issues arising from the inherent ambiguity and subjectivity of the task, as well as weaknesses in the proposed models and the manner of model selection.We continue this line of work by adapting the TrueSkill TM algorithm -an online approach for modeling the relative skills of players in ongoing competitions, such as Microsoft's Xbox Live -to the human evaluation of machine translation output. Our experimental results show that TrueSkill outperforms other recently proposed models on accuracy, and also can significantly reduce the number of pairwise annotations that need to be collected by sampling non-uniformly from the space of system competitions.
IntroductionThe Workshop on Statistical Machine Translation (WMT) has long been a central event in the machine translation (MT) community for the evaluation of MT output. It hosts...