The cBAD competition aims at benchmarking stateof-the-art baseline detection algorithms. It is in line with previous competitions such as the ICDAR 2013 Handwriting Segmentation Contest. A new, challenging, dataset was created to test the behavior of state-of-the-art systems on real world data. Since traditional evaluation schemes are not applicable to the size and modality of this dataset, we present a new one that introduces baselines to measure performance. We received submissions from five different teams for both tracks.
Text line detection is crucial for any application associated with Automatic Text Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data or simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations and time periods. The dataset contains varying page layouts and degradations that challenge text line segmentation methods. Well established text line segmentation evaluation schemes such as the Detection Rate or Recognition Accuracy demand for binarized data that is annotated on a pixel level. Producing ground truth by these means is laborious and not needed to determine a method's quality. In this paper we propose a new evaluation scheme that is based on baselines. The proposed scheme has no need for binarization and it can handle skewed as well as rotated text lines. The ICDAR 2017 Competition on Baseline Detection and the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts used this evaluation scheme. Finally, we present results achieved by a recently published text line detection algorithm.
The ICDAR 2017 Competition on Historical Document Writer Identification is dedicated to record the most recent advances made in the field of writer identification. The goal of the writer identification task is the retrieval of pages, which have been written by the same author. The test dataset used in this competition consists of 3600 handwritten pages originating from 13 th to 20 th century. It contains manuscripts from 720 different writers where each writer contributed five pages. This paper describes the dataset, as well as the details of the competition. Five different institutions submitted six methods which were ranked using identification and retrieval metrics. The paper describes the competition details including the dataset, the evaluation measures used as well as a short description of each submitted method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.