“…Despite the great success of these sequence-to-sequence models, they translate in a sentence-by-sentence manner, utilizing a large amount of sentence-level parallel data, while totally ignoring extra-sentential context information and intersentence consistency. This issue has attracted wide attention to context-aware translation recently, and many contextaware translation approaches [Wang et al, 2017;Tiedemann and Scherrer, 2017;Bawden et al, 2018;Voita et al, 2018;Maruf and Haffari, 2018;Kuang et al, 2018;Kuang and Xiong, 2018;Läubli et al, 2018;Miculicich et al, 2018;Voita et al, 2019c;Voita et al, 2019b;Xiong et al, 2019;Tan et al, 2019] are proposed.…”