BioNLP 2017 2017
DOI: 10.18653/v1/w17-2317
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings

Abstract: We present an unsupervised contextsensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. We greatly outperform two baseline off-the-shelf spelling correction tools on a manually annotated MIMIC-III test set, and counter the frequenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 46 publications
(42 citation statements)
references
References 11 publications
0
42
0
Order By: Relevance
“…To achieve this goal, the misspelled words must be corrected. The spell-checkers proposed by Fivez et al [7] and Lu et al [21] are in biomedical domains. The implementation code of Fivez et al's is not publicly available.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To achieve this goal, the misspelled words must be corrected. The spell-checkers proposed by Fivez et al [7] and Lu et al [21] are in biomedical domains. The implementation code of Fivez et al's is not publicly available.…”
Section: Methodsmentioning
confidence: 99%
“…(1) Generation of a candidate pool: Candidate suggestions for each detected misspelling are generated by returning all words from our dictionary that have an edit distance [19] up to a given threshold. (2) Scoring Function: Given a misspelled token in a text and a set of candidate corrections for that token, the scoring function ranks all suggested candidates based on the following four scores (see Figure 1): [7] calculated contextual similarity scores using neural word embeddings, taking the context around the misspelling into account. To calculate the contextual similarity score for a candidate, this paper uses a similar approach.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The ranking step is the most challenging one and is the focus of the most work on non-word spelling correction (Fivez et al, 2017b). Our model uses both the features of the misspelling+candidate pair and the contextual information.…”
Section: Ranking Of Candidate Correctionsmentioning
confidence: 99%
“…We evaluate the model on a data set from a very different content domain -clinical medical records. The genre of clinical free text poses an interesting challenge to the spelling correction task, since it is notoriously noisy (Fivez et al, 2017a;Lai et al, 2015).…”
Section: Out-of-domain Evaluationmentioning
confidence: 99%