“…Finally, we present an application of our matching framework in a domain with dirty firm-level financial data that we extracted from historical archives by using Optical Character Recognition (OCR) software (Kamlah et al, 2022). The data represent German firms operating in the period from 1910 to 1919 with non-harmonized and non-standardized attributes extracted from the "Handbuch der deutschen Aktiengesellschaften" (see also Gram et al, 2022). In a 5-fold cross-validation with 30% train and 70% test random sample splits, our framework achieves an average 99.36 F-score in the test sub-sample.…”