Lifelong language learning enables a language model to accumulate knowledge throughout training on a stream of text data. Recent research on lifelong language learning is based on samples of previous tasks from an episodic memory or generative model. LAMOL, a representative generative modelbased lifelong language learning model, preserves the previous information with generated pseudo-old samples, which is suboptimal. In this paper, we propose an improved version of LAMOL, MFK-LAMOL, which constructs generative replay using more effective method. When a new task is received, MFK-LAMOL replays sufficient previous data and retrieves important examples for training alongside the new task. Specifically, it selects the examples with the most forgotten knowledge learned from previous tasks based on how much they include knowledge that has been forgotten after learning new information. We show that the proposed method outperforms LAMOL on a stream of three different natural language processing tasks.INDEX TERMS Lifelong language learning, natural language processing, catastrophic forgetting, a stream of text data, generative replay
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.