ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset

Michael, Johannes; Weidemann, Max; Laasch, Bastian; Labahn, Roger

doi:10.1007/978-3-030-68793-9_30

Cited by 6 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The detection of text lines has been widely explored in historical manuscript text books [26,9] and other historical documents of different natures, such as newspapers [25], meteorological tables [1] finding aids [33], as well as many other supports. With index tables, one can consider the issue as a two-class image segmentation task: we separate text lines from the background.…”

Section: Document Image Analysismentioning

confidence: 99%

“…These models have been evaluated and compared on COCO challenges [16], and are fully integrated in popular toolkits such as Detectron2 or LayoutParser. Mask-RCNN has been prior used for document understanding such as on historical newspapers [25]. In contrast, YOLACT and YOLACT++ [10] are single-step approaches focusing on efficiency and increasing the number of frames per second (FPS), a metric that indicates the number of images processed in one second.…”

Section: Document Image Analysismentioning

confidence: 99%

See 1 more Smart Citation

Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)

Bernard,

Wall,

Boillet

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In this paper, we address the challenge of document image analysis for historical index table documents with handwritten records. Demographic studies can gain insight from the use of automatic document analysis in such documents through the study of population movements. To evaluate the efficacy of automatic layout analysis tools, we release the PARES dataset [6], which contains 250 labeled index table images originating from French archives. Also, we run state-of-the-art algorithms (U-FCN, R-CNN and Transformers) in order to detect the lines within index tables, a common prerequisite for handwritten text recognition (HTR). Our results indicate that text line extraction works well with the U-FCN model, while also indicating that Transformer architectures show promise for accurate text line detection in such historical documents with great efficiency. This is a encouraging step towards a Transformer-based architecture for both layout and content detection. This process and dataset represent a first step to automatically analyze handwritten and historical index tables. In addition to this paper and the PARES [6] dataset of historical index tables of 250 images, we release segmentation masks, the code we used to train and test the models, and the models themselves.

show abstract

Section: Document Image Analysismentioning

confidence: 99%

Section: Document Image Analysismentioning

confidence: 99%

Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)

Bernard,

Wall,

Boillet

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…-Text Recognition (Michael et al, 2019) and Article Separation (Michael et al, 2020), extracting the layout of newspapers (e.g. articles and graphical regions) from digitized newspapers and transforming the content to textual format, providing full articles through automatic layout analysis, text recognition and article separation.…”

Section: The Newseye Projectmentioning

confidence: 99%

An OER on digital historical research on European historical newspapers with the NewsEye platform

Suire¹,

Sidère²,

Doucet³

2023

EFI

View full text Add to dashboard Cite

In this article, we introduce an Open Education Resource (OER) on digital historical research with historical newspapers,11The URL will be given with the camera-ready version of this paper. intended to give students the means to understand the induced risks in working with large collections of digitised documents, as well as the keys to benefit from the advances of natural language processing over large multilingual collections of European historical newspapers. This resource exploits results of the NewsEye Horizon 2020 research and innovation project. It is part of a set of 7 OERs developed and shared within the Erasmus+ project Digital Methods Platform for Arts and Humanities (DiMPAH).

show abstract

“…Recent studies further improve by introducing border or counter awareness [47,42,56,8], local refinement [51,11], deformation convolution [39,43], Bezier curve [22], etc. Besides, document layout analysis [7,54,12,26,24] have been studied for years that usually take reading order of texts in document as consideration.…”

Section: Scene Text Detectionmentioning

confidence: 99%

Contextual Text Block Detection towards Scene Text Understanding

Xue¹,

Huang²,

Lu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Most existing scene text detectors focus on detecting characters or words that only capture partial text messages due to missing contextual information. For a better understanding of text in scenes, it is more desired to detect contextual text blocks (CTBs) which consist of one or multiple integral text units (e.g., characters, words, or phrases) in natural reading order and transmit certain complete text messages. This paper presents contextual text detection, a new setup that detects CTBs for better understanding of texts in scenes. We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB. To this end, we design a novel scene text clustering technique that treats integral text units as tokens and groups them (belonging to the same CTB) into an ordered token sequence. In addition, we create two datasets SCUT-CTW-Context and ReCTS-Context to facilitate future research, where each CTB is well annotated by an ordered sequence of integral text units. Further, we introduce three metrics that measure contextual text detection in local accuracy, continuity, and global accuracy. Extensive experiments show that our method accurately detects CTBs which effectively facilitates downstream tasks such as text classification and translation. The project is available at https://sg-vilab.github.io/publication/xue2022contextual/.

show abstract

ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset

Cited by 6 publications

References 21 publications

Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)

Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)

An OER on digital historical research on European historical newspapers with the NewsEye platform

Contextual Text Block Detection towards Scene Text Understanding

Contact Info

Product

Resources

About