“…Extra knowledge (e.g., pre-trained embeddings (Song et al, 2017;Song and Shi, 2018; and pretrained models (Devlin et al, 2019;Diao et al, 2019)) can provide useful information and thus enhance model performance for many NLP tasks (Tian et al, 2020a,b,c). Specifically, memory and memory-augmented neural networks (Zeng et al, 2018;Santoro et al, 2018;Diao et al, 2020;Tian et al, 2020d) are another line of related research, which can be traced back to , which proposed memory networks to leverage extra information for question answering; then Sukhbaatar et al (2015) improved it with an end-to-end design to ensure the model being trained with less supervision. Particularly for Transformer, there are also memory-based methods proposed.…”