2015
DOI: 10.1016/j.jbi.2015.06.029
|View full text |Cite
|
Sign up to set email alerts
|

Combining knowledge- and data-driven methods for de-identification of clinical narratives

Abstract: A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of clinical notes, which includes the identification of mentions of Protected Health Information (PHI). We describe the methods developed and evaluated as part of the i2b2/UTHealth 2014 challenge to identify PHI defined by 25 entity types in longitudinal clinical narratives. Our approach combines knowledge-driven (dictionaries and rules) and data-driven… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
51
0
2

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 58 publications
(53 citation statements)
references
References 18 publications
0
51
0
2
Order By: Relevance
“…Prior de-id studies used multi-pass architectures at the macro level. For instance, Dehghan et al used a two-pass approach, where the second pass was a post-processing phase to filter low-quality annotations from the first pass(16). Our system applied a multi-pass framework at a micro level: by methodology and PHI type.…”
Section: Discussionmentioning
confidence: 99%
“…Prior de-id studies used multi-pass architectures at the macro level. For instance, Dehghan et al used a two-pass approach, where the second pass was a post-processing phase to filter low-quality annotations from the first pass(16). Our system applied a multi-pass framework at a micro level: by methodology and PHI type.…”
Section: Discussionmentioning
confidence: 99%
“…The approaches we designed are built using two previously published methods [22], which include a knowledge-driven open source algorithm (mDEID) and a data-driven method (CliDEID) built using linear chain CRF. We used default (CRF++) parameters: L2-regularization with C=1.00, ETA=0.001.…”
Section: Methodsmentioning
confidence: 99%
“…Submission 2 is built on top of mDEID, which was initially modeled on the i2b2/UTHealth 2014 Track I [22, 23]. The rules already available in mDEID were updated based on the new training data.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This extra metadata would help reduce any ambiguity the DataSHIELD end users may encounter with short 15 The BioPortal -https://bioportal.bioontology.org/. (Dehghan, 2015;Meystre et al 2010;Zhou et al 2015). The successful import of structured text and clinical ontology data into DataSHIELD as presented here, combined with the modular nature of the infrastructure, would make it possible to integrate and utilise existing open source text mining tools to give DataSHIELD users increasing functionality.…”
mentioning
confidence: 99%