ii
IntroductionThe quality of automatic translation of human languages has improved tremendously over the past decade or so. While they still do not achieve publication-quality performance in most cases, state-of-the-art machine translation systems can now deliver a level of quality that make the post-editing of raw machine output by human translators a viable and cost-effective alternative to translation from scratch. Moreover, computerized workflow management can improve consistency in translation, in particular with respect to terminology, and can give translators easy access to dictionaries, glossaries and databases of past translations.Much research in the machine translation community in the past has focused on improving fully automatic MT, but interest in integrating information technology -and specifically machine translation technology -into the translator's workflow is growing in many areas of research: in machine translation research as to how best to provide useful information to the human translator, in translation tool development as to how to make the best use of this information, and in translation process studies in understanding the cognitive and physical processes that take place when humans post-edit or interact with computer-produced translations.This workshop brings together researchers investigating issues in human-computer interaction in the context of translation from a variety of research angles. We have been able to assemble a wonderful roster of talks, posters and system demonstrations that nicely illustrate the current state of research, and we look forward to a productive day of learning and fruitful discussions. This paper proposes to use Word Confidence Estimation (WCE) information to improve MT outputs via N-best list reranking. From the confidence label assigned for each word in the MT hypothesis, we add six scores to the baseline loglinear model in order to re-rank the N-best list. Firstly, the correlation between the WCE-based sentence-level scores and the conventional evaluation scores (BLEU, TER, TERp-A) is investigated. Then, the N-best list re-ranking is evaluated over different WCE system performance levels: from our real and efficient WCE system (ranked 1st during last WMT 2013 Quality Estimation Task) to an oracle WCE (which simulates an interactive scenario where a user simply validates words of a MT hypothesis and the new output will be automatically re-generated). The results suggest that our real WCE system slightly (but significantly) improves the baseline while the oracle one extremely boosts it; and better WCE leads to better MT quality.