Unsupervised Context-Sensitive Spelling Correction of Clinical
            Free-Text with Word and Character N-Gram Embeddings

Fivez, Pieter; Šuster, Simon; Daelemans, Walter

doi:10.18653/v1/w17-2317

Cited by 46 publications

(42 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To achieve this goal, the misspelled words must be corrected. The spell-checkers proposed by Fivez et al [7] and Lu et al [21] are in biomedical domains. The implementation code of Fivez et al's is not publicly available.…”

Section: Methodsmentioning

confidence: 99%

“…(1) Generation of a candidate pool: Candidate suggestions for each detected misspelling are generated by returning all words from our dictionary that have an edit distance [19] up to a given threshold. (2) Scoring Function: Given a misspelled token in a text and a set of candidate corrections for that token, the scoring function ranks all suggested candidates based on the following four scores (see Figure 1): [7] calculated contextual similarity scores using neural word embeddings, taking the context around the misspelling into account. To calculate the contextual similarity score for a candidate, this paper uses a similar approach.…”

Section: Methodsmentioning

confidence: 99%

“…Gao et al and Belinkov and Bisk used spell-checkers that were freely available, but novel spell-checkers have been proposed by recent papers. Fivez et al [7] proposed a context-sensitive spelling correction method for clinical text in English. They collected their replacement candidates from a reference lexicon based on both graphical and phonological distance.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Toward Mitigating Adversarial Texts

Alshemali¹,

Kalita²

2019

IJCA

View full text Add to dashboard Cite

Neural networks are frequently used for text classification, but can be vulnerable to misclassification caused by adversarial examples: input produced by introducing small perturbations that cause the neural network to output an incorrect classification. Previous attempts to generate black-box adversarial texts have included variations of generating nonword misspellings, natural noise, synthetic noise, along with lexical substitutions. This paper proposes a defense against black-box adversarial attacks using a spell-checking system that utilizes frequency and contextual information for correction of nonword misspellings. The proposed defense is evaluated on the Yelp Reviews Polarity and the Yelp Reviews Full datasets using adversarial texts generated by a variety of recent attacks. After detecting and recovering the adversarial texts, the proposed defense increases the classification accuracy by an average of 26.56% on the Yelp Reviews Polarity dataset and 16.27% on the Yelp Reviews Full dataset. This approach further outperforms six of the publicly available, state-of-the-art spelling correction tools by at least 25.56% in terms of average correction accuracy.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Toward Mitigating Adversarial Texts

Alshemali¹,

Kalita²

2019

IJCA

View full text Add to dashboard Cite

show abstract

“…The ranking step is the most challenging one and is the focus of the most work on non-word spelling correction (Fivez et al, 2017b). Our model uses both the features of the misspelling+candidate pair and the contextual information.…”

Section: Ranking Of Candidate Correctionsmentioning

confidence: 99%

“…We evaluate the model on a data set from a very different content domain -clinical medical records. The genre of clinical free text poses an interesting challenge to the spelling correction task, since it is notoriously noisy (Fivez et al, 2017a;Lai et al, 2015).…”

Section: Out-of-domain Evaluationmentioning

confidence: 99%

A Benchmark Corpus of English Misspellings and a Minimally-supervised Model for Spelling Correction

Flor¹,

Fried²,

Rozovskaya³

2019

Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

View full text Add to dashboard Cite

Spelling correction has attracted a lot of attention in the NLP community. However, models have been usually evaluated on artificiallycreated or proprietary corpora. A publiclyavailable corpus of authentic misspellings, annotated in context, is still lacking. To address this, we present and release an annotated data set of 6,121 spelling errors in context, based on a corpus of essays written by English language learners. We also develop a minimallysupervised context-aware approach to spelling correction. It achieves strong results on our data: 88.12% accuracy. This approach can also train with a minimal amount of annotated data (performance reduced by less than 1%). Furthermore, this approach allows easy portability to new domains. We evaluate our model on data from a medical domain and demonstrate that it rivals the performance of a model trained and tuned on in-domain data.

show abstract

Radiologic text correction for better machine understanding

Kicsi,

Szabó Ledenyi,

Vidács

2024

Engineering Reports

View full text Add to dashboard Cite

Radiologic reports often contain misspellings that compromise report quality and pose challenges for machine understanding methods, which require syntactical correctness. General automatic misspell correction solutions are less effective in specialized documents, such as spinal radiologic reports, particularly in morphologically rich languages like Hungarian. Issues arise from complex conjugations and the modification of Latin terms per the rules of the native language. This study introduces a method for the automatic correction of these misspellings, utilizing the Hunspell software and field‐specific dictionaries. This approach, enhanced by linguistic analysis and statistical data, improves information retrieval, as demonstrated in machine‐learning‐based classification and rule‐based identification tasks. Notably, our method identified over 30% more valid errors than human annotators, highlighting its efficiency. We offer a primarily dictionary‐based solution for correcting highly specialized texts and explore the impact of nonword correction on machine understanding. This work underscores the significance of tailored spelling correction in enhancing text processing algorithms' accuracy.

show abstract

Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings

Cited by 46 publications

References 11 publications

Toward Mitigating Adversarial Texts

Toward Mitigating Adversarial Texts

A Benchmark Corpus of English Misspellings and a Minimally-supervised Model for Spelling Correction

Radiologic text correction for better machine understanding

Contact Info

Product

Resources

About