Proceedings of the 9th IAPR International Workshop on Document Analysis Systems 2010
DOI: 10.1145/1815330.1815394
|View full text |Cite
|
Sign up to set email alerts
|

A post-processing scheme for malayalam using statistical sub-character language models

Abstract: Most of the Indian scripts do not have any robust commercial OCRs. Many of the laboratory prototypes report reasonable results at recognition/classification stage. However, word level accuracies are still poor. It is well known that word accuracy decreases as the number of characters in a word increase. For Malayalam, the average number of characters in a word is almost twice that of English. Moreover, the number of words required to cover 80% of the Malayalam language is more than forty times that of other In… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2010
2010
2016
2016

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…This information is used to generate alternate words and correct substitution errors, if any. The correction is aided by confusion matrix and statistical sub-character language models [14]. The ranked set of candidate words is processed by validation unit to generate the unique text output.…”
Section: Overview Of Parsing and Recog-nitionmentioning
confidence: 99%
See 4 more Smart Citations
“…This information is used to generate alternate words and correct substitution errors, if any. The correction is aided by confusion matrix and statistical sub-character language models [14]. The ranked set of candidate words is processed by validation unit to generate the unique text output.…”
Section: Overview Of Parsing and Recog-nitionmentioning
confidence: 99%
“…Our parsing module also looks at the statistical sub-character language models (SSLM) as described in [14]. The SSLM model describes the joint probability of pairs of adjacent symbols (sub-characters) appearing in a language.…”
Section: Parsingmentioning
confidence: 99%
See 3 more Smart Citations