2020
DOI: 10.3390/app10165430
|View full text |Cite
|
Sign up to set email alerts
|

Automatic CNN-Based Arabic Numeral Spotting and Handwritten Digit Recognition by Using Deep Transfer Learning in Ottoman Population Registers

Abstract: Historical manuscripts and archival documentation are handwritten texts which are the backbone sources for historical inquiry. Recent developments in the digital humanities field and the need for extracting information from the historical documents have fastened the digitization processes. Cutting edge machine learning methods are applied to extract meaning from these documents. Page segmentation (layout analysis), keyword, number and symbol spotting, handwritten text recognition algorithms are tested on histo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0
4

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 34 publications
0
7
0
4
Order By: Relevance
“…Dari proses digitalisasi, dapat ditarik kesimpulan dan analisa serta didapatkan informasi baru dari dokumen dan arsip-arsip tersebut [2]. Beberapa teknik analisa yang dapat diterapkan pada dokumen-dokumen ini adalah segmentasi halaman, spotting citra pada simbol dan angka, Optical Character Recognition (OCR) [3] dan Handwritten Text Recognition (HTR) [4].…”
Section: Pendahuluanunclassified
“…Dari proses digitalisasi, dapat ditarik kesimpulan dan analisa serta didapatkan informasi baru dari dokumen dan arsip-arsip tersebut [2]. Beberapa teknik analisa yang dapat diterapkan pada dokumen-dokumen ini adalah segmentasi halaman, spotting citra pada simbol dan angka, Optical Character Recognition (OCR) [3] dan Handwritten Text Recognition (HTR) [4].…”
Section: Pendahuluanunclassified
“…We used the open-source dhSegment toolbox [23] as in our previous segmentation studies [24] and [25]. The authors describe the toolbox as a general and flexible architecture for pixel-wise segmentation related tasks on historical documents.…”
Section: B Training Deep Learning Cnn Modelmentioning
confidence: 99%
“…Training a deep learning model with these modern datasets and testing with historical digits did not yield high accuracies in the literature [29]. Therefore, we created a dataset containing over 6000 digits, which contributes to the literature on this aspect [9]. The dataset can be accessed at https://urbanoccupations.ku.edu.tr/historicalarabic-handwritten-digit-dataset/ (accessed on 13 September 2021).…”
Section: Related Workmentioning
confidence: 99%
“…For this study, we focused on one location: Manisa town in western Anatolia in Turkey. We first employed a CNN-based page segmentation technique to retrieve demographic data of individuals by using the models developed in our previous studies [9,10]. After that, we used horizontal projection profile-based line segmentation to the demographic information of these detected individuals in these registers and obtained the age data in the last line.…”
Section: Introductionmentioning
confidence: 99%