2008
DOI: 10.1016/j.artmed.2007.10.001
|View full text |Cite
|
Sign up to set email alerts
|

A de-identifier for medical discharge summaries

Abstract: Objective-Clinical records contain significant medical information that can be useful to researchers in various disciplines. However, these records also contain personal health information (PHI) whose presence limits the use of the records outside of hospitals.The goal of de-identification is to remove all PHI from clinical records. This is a challenging task because many records contain foreign and misspelled PHI; they also contain PHI that are ambiguous with non-PHI. These complications are compounded by the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
48
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 67 publications
(48 citation statements)
references
References 13 publications
0
48
0
Order By: Relevance
“…Various supervised machine learning algorithms, including CRFs [16]-[18], Support Vector Machines (SVM) [19], and Decision Trees [20], [21], have been employed. Chen et al [8] proposed a non-parametric Bayesian Hidden Markov Model (HMM) that learns a potentially infinite number of latent variables by using a Dirichlet process prior.…”
Section: Related Workmentioning
confidence: 99%
“…Various supervised machine learning algorithms, including CRFs [16]-[18], Support Vector Machines (SVM) [19], and Decision Trees [20], [21], have been employed. Chen et al [8] proposed a non-parametric Bayesian Hidden Markov Model (HMM) that learns a potentially infinite number of latent variables by using a Dirichlet process prior.…”
Section: Related Workmentioning
confidence: 99%
“…Uzuner et al [7] have studied the role of local context (i.e. the words that are immediate neighbours of the target PHI or that have immediate syntactic relation with it) for de-identification when using support vector machine classifiers.…”
Section: De-identificationmentioning
confidence: 99%
“…They observed that features that thoroughly capture local context are beneficial to the PHI de-identification task. While not relying on local context features as thoroughly as Uzuner et al [7], Anonym does use features that implicitly capture local context information, such as token n-grams and part-of-speech.…”
Section: De-identificationmentioning
confidence: 99%
“…The task is typically approached as named entity recognition (NER) of PHI data types. Two main approaches have been followed and quite often combined: knowledge-driven methods that rely on dictionaries and rules for regularized PHI types [10–13] and machine-learning and hybrid approaches that aim at learning from data [14–19]. The results of the community challenges have suggested that machine-learning approaches, in principle, provide better and more consistent performance [7, 20].…”
Section: Introductionmentioning
confidence: 99%