This paper discusses neural machine translation (NMT), a new paradigm in the MT field, comparing the quality of NMT systems with statistical MT by describing three studies using automatic and human evaluation methods. Automatic evaluation results presented for NMT are very promising, however human evaluations show mixed results. We report increases in fluency but inconsistent results for adequacy and post-editing effort. NMT undoubtedly represents a step forward for the MT field, but one that the community should be careful not to oversell.
Human rating of predicted post-editing effort is a common activity and has been used to train confidence estimation models. However, the correlation between human ratings and actual post-editing effort is under-measured. Moreover, the impact of presenting effort indicators in a post-editing user interface on actual post-editing effort has hardly been researched. In this study, ratings of perceived post-editing effort are tested for correlations with actual temporal, technical and cognitive post-editing effort. In addition, the impact on post-editing effort of the presentation of post-editing effort indicators in the user interface is also tested. The language pair involved in this study is English-Brazilian Portuguese. Our findings, based on a small sample, suggest that there is little agreement between raters for predicted post-editing effort and that the correlations between actual post-editing effort and predicted effort are only moderate, and thus an inefficient basis for MT confidence estimation. Moreover, the presentation of post-editing effort indicators in the user interface appears not to impact on actual postediting effort.
The use of neural machine translation (NMT) in a professional scenario implies a number of challenges despite growing evidence that, in language combinations such as English to Spanish, NMT output quality has already outperformed statistical machine translation in terms of automatic metrics scores. This article presents the result of an empirical test that aims to shed light on the differences between NMT postediting and translation with the aid of a translation memory (TM). The results show that NMT post-editing involves less editing than TM segments, but this editing appears to take more time, with the consequence that NMT post-editing does not seem to improve productivity as may have been expected. This might be due to the fact that NMT segments show a higher variability in terms of quality and time invested in post-editing than TM segments that are 'more similar' on average. Finally, results show that translators who perceive that NMT boosts their productivity actually performed faster than those who perceive that NMT slows them down.
Response to Reviewers:All changes included.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.