“…In historical document layout analysis in particular, e.g., in Xu et al (2018), the authors relied on a Multi-Task Fully Convolutional Network (FCN) to segment highly unstructured manuscript and printed-text pages into multiple semantically relevant groups (e.g., marginalia, main text, and comments), while Ravichandra et al (2022) opts for an object-detection based approach relying on the YOLO model (Redmon et al, 2015). Others have recognized the value of extracting images from historical documents due to their importance in transmitting the information and ideas contained in the texts, leading to approaches such as the FCN networks presented in Monnier and Aubry (2020) and the object detection-based methodologies applied to specific corpora adopted by Dutta et al (2021); Büttner et al (2022) from techniques like YOLO (Redmon et al, 2015), U-Net (Ronneberger et al, 2015), or Faster R-CNN (Ren et al, 2016). By getting closer to the textual content of these documents, numerous AI-based approaches for optical character recognition (OCR) and handwritten text recognition (HTR) have been proposed, with deep learning-based approaches (Jaderberg et al, 2016) setting new standards.…”