1983
DOI: 10.1002/asi.4630340108
|View full text |Cite
|
Sign up to set email alerts
|

Collection and characterization of spelling errors in scientific and scholarly text

Abstract: The SPEEDCOP (SPElling Error Detection Correction Project) project recently completed at Chemical Abstracts Service (CAS) extracted over 50,000 misspellings from approximately 25,000,000 words of text from seven scientific and scholarly databases. The misspellings were automatically classified and the error types analyzed. The results, which were consistent over the different databases, showed that the expected incidence of misspelling is 0.2%, that 90-95% of spelling errors have only a single mistake, that su… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
42
0
2

Year Published

1984
1984
2009
2009

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 58 publications
(47 citation statements)
references
References 9 publications
3
42
0
2
Order By: Relevance
“…The bulk of the errors there were shown to be single-character insertions, deletions and substitutions, in line with the findings of previous studies, the largest of which was [5]. In Table 3 and Table 4 we list comparable statistics obtained from the OCRed corpora we here work with: statistics on 5,047 mainly OCR-errors from the sgd and 3,799 from the ddd.…”
Section: Ocr-errors and Other Lexical Variation In Corporasupporting
confidence: 83%
“…The bulk of the errors there were shown to be single-character insertions, deletions and substitutions, in line with the findings of previous studies, the largest of which was [5]. In Table 3 and Table 4 we list comparable statistics obtained from the OCRed corpora we here work with: statistics on 5,047 mainly OCR-errors from the sgd and 3,799 from the ddd.…”
Section: Ocr-errors and Other Lexical Variation In Corporasupporting
confidence: 83%
“…Many approaches have been applied since people started to deal with this problem. Different techniques like edit distance [4], rule-based techniques [10], n-grams [20], probabilistic techniques [14], neural nets [15], similarity key techniques [16,17] and noisy channel model [18,19] have been proposed. All of these are based on the idea of calculating the similarity between the misspelled word and the words contained in a dictionary.…”
Section: Approaches Of Some Spell Checkersmentioning
confidence: 99%
“…Pollock and Zamora report on a spelling error detection project at Chemical Abstracts Service (CAS) and charac terize the types of errors they found. 8 Chemical Abstracts databases are among the most searched databases in the world. CAS is usually characterized as a set of sources with considerable depth and breadth.…”
Section: Context-matching and Biasmentioning
confidence: 99%