Historical text normalization suffers from small datasets that exhibit high variance, and previous work has shown that multitask learning can be used to leverage data from related problems in order to obtain more robust models. Previous work has been limited to datasets from a specific language and a specific historical period, and it is not clear whether results generalize. It therefore remains an open problem, when historical text normalization benefits from multi-task learning. We explore the benefits of multi-task learning across 10 different datasets, representing different languages and periods. Our main findingcontrary to what has been observed for other NLP tasks-is that multi-task learning mainly works when target task data is very scarce.