“…But OCR is a technology still in the making, and available software provides varying levels of accuracy. The best results are usually obtained with a tailored solution involving corpus-specific pre-processing (Bieniecki, Grabowski, and Rozenberg 2007;Dengel et al 1997;Holley 2009;Lat and Jawahar 2018;Volk, Furrer, and Sennrich 2011;Wemhoener, Yalniz, and Manmatha 2013), model training (Boiangiu et al 2016;Reul et al 2018;Springmann et al 2014;Wick, Reul, and Puppe 2018), or postprocessing (Kissos and Dershowitz 2016;Strohmaier et al 2003;Thompson, McNaught, and Ananiadou 2015), but such procedures can be labour-intensive. Pretrained, general OCR processors have a much higher potential for wide adoption in the scholarly community, and hence their out-of-the box performance is of scientific interest.…”