“…Convolutional neural networks have been used extensively on document images, e.g. segmenting documents [Yang et al, 2017, Chen et al, 2017a, Wick and Puppe, 2018, spotting handwritten words [Sudholt and Fink, 2016], classifying documents [Kang et al, 2014] and more broadly detecting text in natural scenes [Liao et al, 2017, Borisyuk et al, 2018. In contrast to our task, these are trained on explicitly labeled datasets with information on where the targets are, e.g.…”