Use of lexical and syntactic techniques in recognizing handwritten text

Srihari, Rohini K.

doi:10.3115/1075812.1075911

Cited by 10 publications

(7 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The lexicon is such a source of linguistic and domain knowledge. Most of the recognition systems rely on a lexicon during the recognition, the so-called lexicon-driven systems, or also after the recognition as a postprocessor of the recognition hypotheses [20,46,77]. However, systems that rely on a lexicon in the early stages have had more success, since they look directly for a valid word [20].…”

Section: The Role Of Language Model In Handwriting Recognitionmentioning

confidence: 99%

Large vocabulary off-line handwriting recognition: A survey

Koerich

Sabourin

Suen

2003

Pattern Analysis & Applications

138

View full text Add to dashboard Cite

Considerable progress has been made in handwriting recognition technology over the last few years. Thus far, handwriting recognition systems have been limited to small and medium vocabulary applications, since most of them often rely on a lexicon during the recognition process. The capability of dealing with large lexicons, however, opens up many more applications. This article will discuss the methods and principles that have been proposed to handle large vocabularies and identify the key issues affecting their future deployment. To illustrate some of the points raised, a large vocabulary off-line handwritten word recognition system will be described.

show abstract

Section: The Role Of Language Model In Handwriting Recognitionmentioning

confidence: 99%

Large vocabulary off-line handwriting recognition: A survey

Koerich

Sabourin

Suen

2003

Pattern Analysis & Applications

138

View full text Add to dashboard Cite

show abstract

“…estimated probability distribution. High-quality language models lie at the heart of most NL applications, such as speech recognition [22], machine translation [7], spelling correction [24] and handwriting recognition [46]. The most successful class of language models are n-gram models, introduced three decades ago [6].…”

Section: Introductionmentioning

confidence: 99%

On the localness of software

Dévanbu

2014

Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

232

197

View full text Add to dashboard Cite

The n-gram language model, which has its roots in statistical natural language processing, has been shown to successfully capture the repetitive and predictable regularities ("naturalness") of source code, and help with tasks such as code suggestion, porting, and designing assistive coding devices. However, we show in this paper that this natural-language-based model fails to exploit a special property of source code: localness. We find that human-written programs are localized: they have useful local regularities that can be captured and exploited. We introduce a novel cache language model that consists of both an n-gram and an added "cache" component to exploit localness. We show empirically that the additional cache component greatly improves the n-gram approach by capturing the localness of software, as measured by both cross-entropy and suggestion accuracy. Our model's suggestion accuracy is actually comparable to a state-of-the-art, semantically augmented language model; but it is simpler and easier to implement. Our cache language model requires nothing beyond lexicalization, and thus is applicable to all programming languages.

show abstract

“…This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction (Church, 1988;Brown et al, 1990;Kernighan, Church & Gale, 1990;Hull, 1992;Srihari and Baltus, 1992). The most commonly used language models are very simple (e.g.…”

Section: Overviewmentioning

confidence: 99%

A bit of progress in language modeling

Goodman

2001

Computer Speech & Language

344

291

View full text Add to dashboard Cite

In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, interpolated Kneser-Ney smoothing, and clustering. We present explorations of variations on, or of the limits of, each of these techniques, including showing that sentence mixture models may have more potential. While all of these techniques have been studied separately, they have rarely been studied in combination. We compare a combination of all techniques together to a Katz smoothed trigram model with no count cutoffs. We achieve perplexity reductions between 38 and 50% (1 bit of entropy), depending on training data size, as well as a word error rate reduction of 8.9%. Our perplexity reductions are perhaps the highest reported compared to a fair baseline.

show abstract

Use of lexical and syntactic techniques in recognizing handwritten text

Cited by 10 publications

References 7 publications

Large vocabulary off-line handwriting recognition: A survey

Large vocabulary off-line handwriting recognition: A survey

On the localness of software

A bit of progress in language modeling

Contact Info

Product

Resources

About