2017
DOI: 10.1007/s11042-017-4373-y
|View full text |Cite
|
Sign up to set email alerts
|

PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 80 publications
(18 citation statements)
references
References 29 publications
0
18
0
Order By: Relevance
“…We have used the Cmaterdb dataset version 1.1.1 [37] and 1.5.1 [38], ICDAR 2013 Segmentation dataset [39], and PHDIndic_11 dataset [40] for performing skew correction and segmentation of the word images. Cmaterdb dataset version 1.1.1 and 1.5.1 are two benchmark datasets comprising of Bangla and Devanagari documents, respectively.…”
Section: Experimental Results and Analysismentioning
confidence: 99%
“…We have used the Cmaterdb dataset version 1.1.1 [37] and 1.5.1 [38], ICDAR 2013 Segmentation dataset [39], and PHDIndic_11 dataset [40] for performing skew correction and segmentation of the word images. Cmaterdb dataset version 1.1.1 and 1.5.1 are two benchmark datasets comprising of Bangla and Devanagari documents, respectively.…”
Section: Experimental Results and Analysismentioning
confidence: 99%
“…Modified log-Gabor filter (MLG) was used for feature extraction to develop bi-script (Devanagari-Roman and Bangla-Roman) and tri-script (Bangla-Devanagari-Roman) word-level script identification modules. In 2017, Obaidullah et al [12] presented a handwritten document image dataset at page-level named PHDIndic_11 having 11 officially recognized Indic scripts: Devanagari, Bangla, Urdu, Roman, Oriya, Gujarati, Gurumukhi, Tamil, Malayalam, Telugu, and Kannada. The paper also contained the results for handwritten script identification (HSI).…”
Section: Related Studymentioning
confidence: 99%
“…The extracted text blocks also have a chance of containing lines of varying size, thickness, and white spaces between characters, lines, and words. Instead of performing any homogenizing (12) ClassificationAccuracy (%) = #successfullyclassifiedcomponents #totalcomponents present × 100.…”
Section: Preparation Of Handwritten Indic Script Databasementioning
confidence: 99%
“…Features are extracted from sample images of handwritten text in 11 scripts at block, text‐line, and word levels. In Obaidullah, Halder, Santosh, Das, and Roy (), the authors have self‐prepared the PHDIndic_11 page‐level dataset (containing a total of 1,458 pages of handwritten samples in 11 official scripts of India) and compared its classification accuracy using script‐dependent and script‐independent feature sets at page‐level. MLP and simple logistic (SL) classifiers as well as a metaclassifier that combines both MLP and SL are used, and a 116‐element feature vector is extracted from each of the text images.…”
Section: Related Workmentioning
confidence: 99%