2014 11th IAPR International Workshop on Document Analysis Systems 2014
DOI: 10.1109/das.2014.74
|View full text |Cite
|
Sign up to set email alerts
|

Towards a Robust OCR System for Indic Scripts

Abstract: Abstract-The current Optical Character Recognition (OCR) systems for Indic scripts are not robust enough for recognizing arbitrary collection of printed documents. Reasons for this limitation includes the lack of resources (e.g. not enough examples with natural variations, lack of documentation available about the possible font/style variations) and the architecture which necessitates hard segmentation of word images followed by an isolated symbol recognition. Variations among scripts, latent symbol to UNICODE… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(18 citation statements)
references
References 16 publications
0
18
0
Order By: Relevance
“…In all these cases, our feature learning method consistently improves the accuracy over profile features [10] which is a typical example of a hand crafted feature for the word recognition. Other advantages include that the learnt features will be of lower dimension and therefore are compact, efficient and takes smaller training time.…”
Section: Introductionmentioning
confidence: 78%
“…In all these cases, our feature learning method consistently improves the accuracy over profile features [10] which is a typical example of a hand crafted feature for the word recognition. Other advantages include that the learnt features will be of lower dimension and therefore are compact, efficient and takes smaller training time.…”
Section: Introductionmentioning
confidence: 78%
“…These networks have been used in the past for printed text [2] and handwritten text recognition [3]. This network consists of two LSTM networks in which one network takes the input from beginning to end while other network takes the input from end to beginning.…”
Section: Rnn For Script and Language Identificationmentioning
confidence: 99%
“…For this, we use the popular profile features [2,17], which can be used to represent the lines and words as a feature sequence. In this work, we calculate six profile features from every word and image.…”
Section: A Representation Of Words and Linesmentioning
confidence: 99%
See 1 more Smart Citation
“…Many methods ( [1]; [4]; [5]; [15]) are developed in the past for recognizing Indian scripts, which generally focused on plain background images and used connected component analysis for recognition of scripts. The scope of the existing methods is limited to a particular script but not for multiple scripts because these methods exploited the knowledge of the scripts for script recognizing.…”
Section: 0introductionmentioning
confidence: 99%