Joseph J. Pollock scite author profile

The SPEEDCOP (SPElling Error Detection Correction Project) project recently completed at Chemical Abstracts Service (CAS) extracted over 50,000 misspellings from approximately 25,000,000 words of text from seven scientific and scholarly databases. The misspellings were automatically classified and the error types analyzed. The results, which were consistent over the different databases, showed that the expected incidence of misspelling is 0.2%, that 90-95% of spelling errors have only a single mistake, that substitution is homogeneous while transposition is heterogeneous, that omission is the commonest type of misspelling, and that inadvertent doubling of a letter is the most important cause of insertion errors. The more frequently a letter occurs in the text, the more likely it is to be involved in a spelling error. Most misspellings collected by SPEEDCOP are of the type colloquially referred to as "typos" and approximately 90% are unlikely to be repeated in normal spans of text.

show abstract

The use of trigram analysis for spelling error detection

Zamora¹,

Pollock²,

Zamora³

1981

Information Processing & Management

View full text Add to dashboard Cite

Automatic Abstracting Research at Chemical Abstracts Service

Pollock¹,

Zamora²

1975

J. Chem. Inf. Comput. Sci.

View full text Add to dashboard Cite

Spelling Error Detection and Correction by Computer: Some Notes and a Bibliography

Pollock¹

1982

View full text Add to dashboard Cite

INTRODUCTIONNOT ONLY DOES the problem of correcting spelling errors by computer have a long history, it is evidently of considerable current interest as papers 17,95 and letters 18,30,57,66,69 on the topic continue to appear rapidly. This is not surprising, since techniques useful in detecting and correcting mis-spellings normally have other important applications. Moreover, both the power of small computers and the routine production of machine-readable text have increased enormously over the last decade to the point where automatic spelling error detection/correction has become not only feasible but highly desirable.Potential applications for spelling error detection/correction techniques arise in numerous applications. Early papers focused on the correction of output from optical character recognition (OCR), voice recognition, or Morse code, or on spelling errors in program code, but the domain of most interest today is probably the correction of machine-readable text made available by word processing. However, methods for assessing the similarity of two strings of symbols, which are widely used to compare mis-spellings with dictionary words, are of very general interest; e.g., for determining the evolutionary distance of proteins. 56,70,72 Similarly, one can imagine spelling correction techniques being extended to almost any kind of error-prone transmission, even to partially decrypted code. Also, spelling error detection involves searching large dictionaries; and this capability is obviously of widespread utility.This note attempts to provide a comprehensive bibliography of papers in English on the major aspects of spelling error detection and correction of English text. The author is solely reponsible for the content of the annotations. SPELLING ERROR DETECTIONThe goal of spelling error detection is basically to decide if a text string is a valid word; this is normally done by determining whether or not the string is in a system dictionary. As both the dictionary and the number of words to be processed are usually large in real-world systems, it is important to make the dictionary search highly efficient. Note that words need not be literally present in the dictionary; they may be stored much more economically as, for example, hash codes, patterns of bits distributed over a long string, or n-grams. However, in compressed representations, one usually has to be content with a very high probability that a given word is present or not rather than with the certainty given by a literal dictionary. Similarly, the dictionary may be searched via tries, trees, hash coding (scatter storage) or a variety of other techniques.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Joseph J. Pollock

Automatic spelling correction in scientific and scholarly text

Collection and characterization of spelling errors in scientific and scholarly text

The use of trigram analysis for spelling error detection

Automatic Abstracting Research at Chemical Abstracts Service

Spelling Error Detection and Correction by Computer: Some Notes and a Bibliography

Contact Info

Product

Resources

About