2018 13th IAPR International Workshop on Document Analysis Systems (DAS) 2018
DOI: 10.1109/das.2018.38
|View full text |Cite
|
Sign up to set email alerts
|

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

Abstract: Text line detection is crucial for any application associated with Automatic Text Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data or simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations and time periods. The dataset contains varying page layouts and degradations that challenge text line segmentation methods. Well established text line segmentation evaluation sche… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
48
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 58 publications
(49 citation statements)
references
References 22 publications
0
48
0
Order By: Relevance
“…most characters rest upon and descenders extend below". The READ-BAD dataset [22] has been used for the cBAD: ICDAR2017 Competition [8].…”
Section: B Baseline Detectionmentioning
confidence: 99%
“…most characters rest upon and descenders extend below". The READ-BAD dataset [22] has been used for the cBAD: ICDAR2017 Competition [8].…”
Section: B Baseline Detectionmentioning
confidence: 99%
“…There are many possibilities to determine if two text lines are close to each other or not. Here, the well-established method of [3] is used. We say two text lines are close, if their baselines are geometrically close to each other (see Section III-B for details).…”
Section: B Using Geometric Information As Restrictionmentioning
confidence: 99%
“…To define the neighborhood of G x we use a method that compares the so-called baselines of the text lines. This is a common measure to evaluate the performance of a layout analysis result [3]. We call the tuple of twodimensional points B "`B 1 , .…”
Section: B Restricting By Geometric Positionmentioning
confidence: 99%
See 2 more Smart Citations