Fine-tuning pre-trained deep networks is a practical way of benefiting from the representation learned on a large database while having relatively few examples to train a model. This adjustment is nowadays routinely performed so as to benefit of the latest improvements of convolutional neural networks trained on large databases. Fine-tuning requires some form of regularization, which is typically implemented by weight decay that drives the network parameters towards zero. This choice conflicts with the motivation for fine-tuning, as starting from a pre-trained solution aims at taking advantage of the previously acquired knowledge. Hence, regularizers promoting an explicit inductive bias towards the pre-trained model have been recently proposed. This paper demonstrates the versatility of this type of regularizer across transfer learning scenarios. We replicated experiments on three state-of-the-art approaches in image classification, image segmentation, and video analysis to compare the relative merits of regularizers. These tests show systematic improvements compared to weight decay. Our experimental protocol put forward the versatility of a regularizer that is easy to implement and to operate that we eventually recommend as the new baseline for future approaches to transfer learning relying on fine-tuning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.