A post-processing scheme for malayalam using statistical sub-character language models

Mohan, Karthika; Jawahar, C. V.

doi:10.1145/1815330.1815394

Cited by 7 publications

(6 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This information is used to generate alternate words and correct substitution errors, if any. The correction is aided by confusion matrix and statistical sub-character language models [14]. The ranked set of candidate words is processed by validation unit to generate the unique text output.…”

Section: Overview Of Parsing and Recog-nitionmentioning

confidence: 99%

“…Our parsing module also looks at the statistical sub-character language models (SSLM) as described in [14]. The SSLM model describes the joint probability of pairs of adjacent symbols (sub-characters) appearing in a language.…”

Section: Parsingmentioning

confidence: 99%

“…The utility of the SSLM in recognition is demonstrated in [14] during post processing of the classifier outputs. Generation of alternate words, their ranking and selection of optimal word are posed as an optimization problem in [14]. Here, we extend this framework by using the SSLM during the parsing phase.…”

Section: Parsingmentioning

confidence: 99%

“…Thus confusion matrix helps in generation of alternatives in a probabilistic setting. The task is modelled as a shortest path finding problem in a multi stage graph [14]. A path from the source node to the destination node in the multistage graph denotes the recognized text corresponding to the input word and the cost of the path influences the rank of the recognized text in the candidate set.…”

Section: Parsingmentioning

confidence: 99%

“…In the second stage, the image fragments are parsed based on a grammar to create a set of feasible fusions. Ranked list of alternate words for the most probable word is then generated using prior knowledge available in the form of Statistical Sub-character Language Models(SSLM) [14] and confusion matrix. The correctness of words in the candidate set is confirmed by a verification unit.…”

Section: Introduction and Related Workmentioning

confidence: 99%

See 4 more Smart Citations

Towards recognition of degraded words by probabilistic parsing

Mohan

Jinesh

Jawahar

2010

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing

Self Cite

View full text Add to dashboard Cite

Though, Indian language OCRs have shown significant improvement in classification rates in recent years, recognition of degraded words still pose a big challenge for the development of robust OCR systems. Ours is an attempt to formulate the problem of degraded word recognition in a generic and formal structure. We formulate the problem of degraded word recognition as a probabilistic parsing problem. A probabilistic parsing based framework is used to rank and validate various possible hypotheses. We effectively combine it with an alternate word generator, symbol recognizer and verification unit to improve recognition rates of degraded words without compromising good characters. We demonstrate our method on Malayalam. We experiment our method on a complete annotated book, where around 65% of the degraded words are correctly recognized using this approach.

show abstract

Section: Overview Of Parsing and Recog-nitionmentioning

confidence: 99%

Section: Parsingmentioning

confidence: 99%

Section: Parsingmentioning

confidence: 99%

Section: Parsingmentioning

confidence: 99%

Section: Introduction and Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Towards recognition of degraded words by probabilistic parsing

Mohan

Jinesh

Jawahar

2010

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing

Self Cite

View full text Add to dashboard Cite

show abstract

Malayalam Offline Handwritten Recognition Using Probabilistic Simplified Fuzzy ARTMAP

Vidya¹,

Indhu²,

Bhadran³

et al. 2013

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

A survey on optical character recognition for Bangla and Devanagari scripts

Bag

Harit

2013

Sadhana

View full text Add to dashboard Cite

The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on Bangla and Devanagari-the two most popular scripts in India. We have summarized most of the published papers on this topic and have also analysed the various methodologies and their reported results. Future directions of research in OCR for Indian scripts have been also given.

show abstract

A post-processing scheme for malayalam using statistical sub-character language models

Cited by 7 publications

References 17 publications

Towards recognition of degraded words by probabilistic parsing

Towards recognition of degraded words by probabilistic parsing

Malayalam Offline Handwritten Recognition Using Probabilistic Simplified Fuzzy ARTMAP

A survey on optical character recognition for Bangla and Devanagari scripts

Contact Info

Product

Resources

About