“…Even if the improvement in OCR quality is considerable, the improved quality can still be challenging for information retrieval engines, especially with short queries and articles, where the information retrieval engine has less evidence for matching the query words and collection data in the engine's index (J€ arvelin et al, 2016;Mittendorf and Sch€ auble, 2000). In a recent study, Bazzo et al (2020), for example, found that statistically significant degradation of search results begins already at the word error rate of 5%, when Portuguese pdf texts with artificially induced errors were sought for with state-of-the-art modern query engine and algorithms. The same kind of results has been achieved in most of the current studies, as a current survey paper of Nguyen et al (2021) shows.…”