2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) 2019
DOI: 10.1109/icdarw.2019.40078
|View full text |Cite
|
Sign up to set email alerts
|

A Computationally Efficient Pipeline Approach to Full Page Offline Handwritten Text Recognition

Abstract: Offline handwriting recognition with deep neural networks is usually limited to words or lines due to large computational costs. In this paper, a less computationally expensive full page offline handwritten text recognition framework is introduced. This framework includes a pipeline that locates handwritten text with an object detection neural network and recognises the text within the detected regions using features extracted with a multi-scale convolutional neural network (CNN) fed into a bidirectional long … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(17 citation statements)
references
References 16 publications
0
17
0
Order By: Relevance
“…From left to right, the columns respectively denote the architecture, the number of trainable parameters, the maximum GPU memory usage during training (data augmentation included), the minimum transcription level required, the minimum segmentation level required, the use of PreTraining (PT) on subimages, the use of specific Curriculum Learning (CL) and finally the Hyperparameter Adaptation (HA) requirements from one dataset to another. As one can see, models from [4,6,21] require transcription and segmentation labels at word or line levels to be trained, which implies more costly annotations. The models from [1,2,7] and the SPAN are pretrained on text line images to speed up convergence and to reach better results, thus also using line segmentation and transcription labels even if it is not strictly necessary.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…From left to right, the columns respectively denote the architecture, the number of trainable parameters, the maximum GPU memory usage during training (data augmentation included), the minimum transcription level required, the minimum segmentation level required, the use of PreTraining (PT) on subimages, the use of specific Curriculum Learning (CL) and finally the Hyperparameter Adaptation (HA) requirements from one dataset to another. As one can see, models from [4,6,21] require transcription and segmentation labels at word or line levels to be trained, which implies more costly annotations. The models from [1,2,7] and the SPAN are pretrained on text line images to speed up convergence and to reach better results, thus also using line segmentation and transcription labels even if it is not strictly necessary.…”
Section: Resultsmentioning
confidence: 99%
“…Among these approaches, [4,3,6] are based on object-detection methods: a Region Proposal Network (RPN), followed by a non-maximal suppression process and Region Of Interest (ROI), generates line or word bounding boxes. An OCR is then applied on these bounding boxes.…”
Section: Segmentation-based Approachesmentioning
confidence: 99%
“…Object-detection approaches for word bounding box predictions have been studied in [25], [26]. They follow the standard object-detection paradigm based on a region proposal network and a non-maximum suppression algorithm [27].…”
Section: Document Layout Analysismentioning
confidence: 99%
“…Ref. [41] proposed an offline technique for recognizing full handwritten documents. This technique comprises a localized (text localization) handwritten text pipeline.…”
Section: Handwriting Recognition Based On Deep Learningmentioning
confidence: 99%