2023
DOI: 10.1109/tpami.2023.3235826
|View full text |Cite
|
Sign up to set email alerts
|

DAN: A Segmentation-Free Document Attention Network for Handwritten Document Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
33
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 43 publications
(34 citation statements)
references
References 45 publications
1
33
0
Order By: Relevance
“…Many open-source architectures have been proposed for handwritten text recognition (HTR). This task is generally performed at linelevel, either relying on a CNN-HMM architecture [4], a CNN-RNN architecture [35], or a transformer-based architecture [13]. Recently, an increasing number of articles have been dedicated to learning from paragraphs or full pages [5,45,13].…”
Section: Handwritten Text Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…Many open-source architectures have been proposed for handwritten text recognition (HTR). This task is generally performed at linelevel, either relying on a CNN-HMM architecture [4], a CNN-RNN architecture [35], or a transformer-based architecture [13]. Recently, an increasing number of articles have been dedicated to learning from paragraphs or full pages [5,45,13].…”
Section: Handwritten Text Recognitionmentioning
confidence: 99%
“…Most of these approaches have been proposed as part of the 2017 Information Extraction in Historical Handwritten Records (IEHHR) competition 2 [17]. Following the same principle, attention-based neural networks such as DAN [13] could also be used for information extraction from full pages, by taking advantage of special tokens for both semantic labelling and structure description.…”
Section: Handwritten Text Recognitionmentioning
confidence: 99%
“…Many open-source architectures have been proposed for handwritten text recognition (HTR). This task is generally performed at linelevel, either relying on a CNN-HMM architecture [4], a CNN-RNN architecture [35], or a transformer-based architecture [13]. Recently, an increasing number of articles have been dedicated to learning from paragraphs or full pages [5,45,13].…”
Section: Handwritten Text Recognitionmentioning
confidence: 99%
“…SMT We propose an image-to-sequence approach with three different feature extractors. The common aspect of all the SMT tested in this paper is the implementation of a decoder, and we specifically follow the implementation of [12]. The SMT contains a transformer decoder with eight layers, four attention heads and an embedding size of 256 features in both the attention and feed-forward modules.…”
Section: Neural Network Configurationmentioning
confidence: 99%
“…There is no solution that currently goes beyond monophonic transcription, but rather adaptations that reduce the problem in order to make it close to a monophonic scenario and solve it using state-of-the-art methods. According to the recent works of HTR [33,13] and Document Understanding (DU) [12,22], OMR should seek to break this monophonic-dependent barrier. In this paper, we propose the Sheet Music Transformer (SMT), an image-to-sequence approach-based on autoregressive Transformers-that is able to transcribe music input images beyond monophony, without adaptations or specific preprocessing steps.…”
Section: Introductionmentioning
confidence: 99%