Adaptive Nearest Neighbor Machine Translation

Zheng, Xin; Zhang, Zhirui; Guo, Junliang; Huang, Shujian; Chen, Boxing; Luo, Weihua; Chen, Jiajun

doi:10.18653/v1/2021.acl-short.47

Cited by 37 publications

(66 citation statements)

References 13 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the following, we will describe in detail the datastore construction and inference process of Policy-KNN. At the t th decoding step, we denote the retrieval results from Token-KNN as Zheng et al (2021), we construct the key vector s t using two kinds of features extracted from the retrieval results of Token-KNN. One feature is the distance, we denote the Euclidean distance between the context representation h t and i th retrieval result k tok i as d i .…”

Section: Policy-knnmentioning

confidence: 99%

“…We apply the FAIRSEQ 3 (Ott et al 2019) toolkit for NMT implementation, and Faiss 4 (Johnson, Douze, and Jégou 2017) for efficient KNN retrieval. Following the previous experiences (Zheng et al 2021;Khandelwal et al 2020), we employ the WMT19 German-English news translation task winner model (Ng et al 2019) as the pre-trained model. The K for Token-KNN and Policy-KNN is 8.…”

Section: Experiments Experimental Setupmentioning

confidence: 99%

“…The λ value for the EMEA dataset and the JRC-Acquis dataset are 0.2 and 0.3, respectively. • Adaptive KNN-MT (Zheng et al 2021): A variant of KNN-MT, which introduces a network to dynamically determine the number of K for each target token. We use the model pre-trained on the IT domain (provided by the paper) and incrementally update the datastore as in the KNN-MT.…”

Section: Experiments Experimental Setupmentioning

confidence: 99%

“…The latency is calculated on the EMEA dataset, and we report the average latency in milliseconds. The speed-up ratio is also computed by comparing it with the Pre-Train Model following the previous practice (Zheng et al 2021). The result is summarized in Table 5.…”

Section: Latencymentioning

confidence: 99%

“…Recently, Khandelwal et al (2020) proposed KNN-MT, a non-parametric and model-agnostic method for domain adaptation. It equips the pre-trained NMT model with a knearest-neighbor classifier over a datastore of cached context representations and corresponding target tokens (Zheng et al 2021). The final translation probability is the interpolation between the probability calculated by the NMT model and the KNN module.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Non-Parametric Online Learning from Human Feedback for Neural Machine Translation

Wang¹,

Wei²,

Zhang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We study the problem of online learning with human feedback in the human-in-the-loop machine translation, in which the human translators revise the machine-generated translations and then the corrected translations are used to improve the neural machine translation (NMT) system. However, previous methods require online model updating or additional translation memory networks to achieve high-quality performance, making them inflexible and inefficient in practice. In this paper, we propose a novel non-parametric online learning method without changing the model structure. This approach introduces two k-nearest-neighbor (KNN) modules: one module memorizes the human feedback, which is the correct sentences provided by human translators, while the other balances the usage of the history human feedback and original NMT models adaptively. Experiments conducted on EMEA and JRC-Acquis benchmarks demonstrate that our proposed method obtains substantial improvements on translation accuracy and achieves better adaptation performance with less repeating human correction operations.

show abstract

Section: Policy-knnmentioning

confidence: 99%