“…Starting with GPT (Radford et al, 2018), generative pre-training already achieved SoTA results in generic and personalised open-domain dialogue (Wolf et al, 2019), but it was the introduction of encoder-decoder pre-trained models like BART (Lewis et al, 2020a) and T5 (Raffel et al, 2020) (Li et al, 2019b) Incremental transformer --PostKS (Lian et al, 2019) Posterior signal 22.5 15.8 KIC (Lin et al, 2020) Soft selection in decoder --DKS Posterior signal + topic drift --SKT (Kim et al, 2020) Sequential latent kn selection 26.8 18.3 DiffKS Difference aware 25.6 20.1 DukeNet (Meng et al, 2020) Knowledge tracking & shifting 26.4 19.6 SKT+ SKT + posterior signal + distillation 27.7 19.4 MIKe (Meng et al, 2021) Initiative aware 28.4 21.5 SKT-KG (Zhan et al, 2021b) Knowledge transition with CRF 26 -KMine * (Lotfi et al, 2021) Posterior signal via generation 27.9 27.0 CoLV (Zhan et al, 2021a) Collaborative latent spaces 30.1 18.9 DIALKI Dialogue-knowledge contextualization 32.9 35.5 DSG (Li et al, 2022) Document semantic graph 29.4 30.8 TAKE Modeling topic shift 28.8 25.8 CorefDiffs † (Xu et al, 2022c) Co-referential and differential kn flow 42.4 41.4 GENKS (Sun et al, With the incremental integration of the KS and RG tasks, knowledge grounding has gradually become a fine-grained process which can happen at the token level while decoding. RAG (Lewis et al, 2020b) and FID (Izacard and Grave, 2020) -both developed originally for abstractive QA-are examples of this approach, which leaves the final and fine-grained knowledge selection to the decoder.…”