“…Since the datasets contain a training and a test set with disjunct writers, the performance of writer retrieval approaches is evaluated by using each document of the test set as a query once. While on modern datasets, neural networks trained in a supervised manner dominate [10,14,20,22], for historical datasets, training on writer label information [19,25] trails either unsupervised methods [5] or approaches based on handcrafted features [16]. Historical data introduces additional challenges, e.g., degradation, different languages, the amount of text, or even potential writerlabel noise by external influences on handwriting, such as the pen used.…”