“…The stateof-the-art NMT model employs an encoder-decoder architecture with an attention mechanism, in which the encoder summarizes the source sentence into a vector representation, and the decoder produces the target string word by word from vector representations, and the attention mechanism learns the soft alignment of a target word against source words (Bahdanau et al, 2015). NMT systems have outperformed the state-of-the-art SMT model on various language pairs in terms of translation qual-ity (Luong et al, 2015;Bentivogli et al, 2016;Junczys-Dowmunt et al, 2016;Wu et al, 2016;Toral and Sánchez-Cartagena, 2017). However, due to some deficiencies of NMT systems such as the limited vocabulary size, low adequacy for some translations, much research work has involved incorporating extra knowledge such as SMT features or linguistic features into NMT to improve translation performance (He et al, 2016;Sennrich and Haddow, 2016;Nadejde et al, 2017;Wang et al, 2017).…”