“…A closer analysis of the noisiest texts confirmed that these peaks issue primarily from errors in image segmentation: in many cases, the correct reading order was not respected, or text regions from different articles were incorrectly intermixed. Apart from these errors, however, the situation appeared quite promising, with a mean character error rate of 2-3%, which is generally considered as a high standard in OCR quality (Fink, Schulz, and Springmann 2017) and which may not influence significantly a stylometric analysis (Eder 2012). For these reasons, instead of proceeding with a manual transcription of the TSZ articles, I decided simply to re-apply the OCR process, while improving the quality of the process as much as possible.…”