2014 11th IAPR International Workshop on Document Analysis Systems 2014
DOI: 10.1109/das.2014.58
|View full text |Cite
|
Sign up to set email alerts
|

The Maurdor Project: Improving Automatic Processing of Digital Documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(24 citation statements)
references
References 5 publications
0
22
0
Order By: Relevance
“…The number of predictions depended on the width of the image. The OM was trained with multiple corpora [11], [12], [13] which included an undocumented database with approximately 153K lines of an historic digitised database 11 and then retrained with the new Bentham data. The system used an hybrid word/character LM that comprised a toplevel LM (3-gram on words/punctuation) for the most frequent words (30k) and a secondary-level LM (10-gram on characters) that dealt with OOV words [14].…”
Section: Competition Protocolmentioning
confidence: 99%
See 1 more Smart Citation
“…The number of predictions depended on the width of the image. The OM was trained with multiple corpora [11], [12], [13] which included an undocumented database with approximately 153K lines of an historic digitised database 11 and then retrained with the new Bentham data. The system used an hybrid word/character LM that comprised a toplevel LM (3-gram on words/punctuation) for the most frequent words (30k) and a secondary-level LM (10-gram on characters) that dealt with OOV words [14].…”
Section: Competition Protocolmentioning
confidence: 99%
“…Hyphenated words were taken as one entire word instead of including its parts. CITlab used Backpropagation-Through-Time (BPTT) [16] using the CTC algorithm [15] for network training 13 .…”
Section: Competition Protocolmentioning
confidence: 99%
“…We performed most of the experiments on the handwritten lines in French from the Maurdor dataset [20]. This dataset has the particularity of being very challenging with heterogeneous images from different kind of documents (forms, letters, drawings, ...) and various scanning procedures.…”
Section: Experiments a Experimental Setupmentioning
confidence: 99%
“…(2) in which G j is the j th gallery image, whose average is λ j , µ i is the mean of those parts of the i th text segment S i underlying G j (x + x , y + y ) and is a function of x and y, is the NCC operator and C i,j is the matching image, whose elements C i,j (x, y) are the matching scores in the range of [0,1]. The location and value of the maximum of C i,j is computed as follows, …”
Section: Parallelised Template Matchingmentioning
confidence: 99%
“…It has numerous applications, particularly in (improving) Optical/Intelligent Character Recognition (OCR/ICR), automatic document analysis and anonymisation [1]. Since the outputs are one of the two classes, binary classification techniques have widely been used to resolve this problem.…”
Section: Introductionmentioning
confidence: 99%