“…Deep learning models are prone to catastrophic forgetting [20,30,48], i.e., training a model with new information interferes with previously learned knowledge and typically greatly degrades performance. This phenomenon has been widely studied in image classification task and most of the current techniques fall into the following categories [10,48]: regularization approaches [5,32,73,13,36], dynamic architectures [69,64,35], parameter isolation [17,53,40] and replay-based methods [66,46,55,26]. Regularization-based approaches are by far the most widely employed and mainly come in two flavours, i.e., penalty computing and knowledge distillation [25].…”