2016
DOI: 10.1016/j.jbi.2016.03.019
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing annotation resources for natural language de-identification via a game theoretic framework

Abstract: Objective Electronic medical records (EMRs) are increasingly repurposed for activities beyond clinical care, such as to support translational research and public policy analysis. To mitigate privacy risks, healthcare organizations (HCOs) aim to remove potentially identifying patient information. A substantial quantity of EMR data is in natural language form and there are concerns that automated tools for detecting identifiers are imperfect and leak information that can be exploited by ill-intentioned data reci… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
8
1
1

Relationship

3
7

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 39 publications
0
8
0
Order By: Relevance
“…Under some circumstances, such as preparing a corpus for public release under a data use agreement, the cost of using four—or even more— annotators may be considered money well spent. Under other circumstances the additional cost burden of attempting to improve annotation completeness in a corpus (or reduce overlooked PII in a de-identification task) will outweigh the expected benefits, at which point “the juice is not worth the squeeze.” Newly reported research applying game theory to questions of clinical text de-identification suggests less than perfect redaction of PII may be sufficient in some situations, (26) highlighting the relevance of quantifying the costs associated with incremental improvement in annotation completeness.…”
Section: Discussionmentioning
confidence: 99%
“…Under some circumstances, such as preparing a corpus for public release under a data use agreement, the cost of using four—or even more— annotators may be considered money well spent. Under other circumstances the additional cost burden of attempting to improve annotation completeness in a corpus (or reduce overlooked PII in a de-identification task) will outweigh the expected benefits, at which point “the juice is not worth the squeeze.” Newly reported research applying game theory to questions of clinical text de-identification suggests less than perfect redaction of PII may be sufficient in some situations, (26) highlighting the relevance of quantifying the costs associated with incremental improvement in annotation completeness.…”
Section: Discussionmentioning
confidence: 99%
“…First, we propose a novel explicit threat model for this problem, allowing us to make formal guarantees about the vulnerability of the published data to adversarial re-identification attempts. Our model bears some relationship to a recent work by Li et al [ 45 ] who also consider an adversary using machine learning to re-identify residual identifiers. However, our model combines this with a budget-limited attacker who can manually inspect instances; in addition, our publisher model involves the choice of a redaction policy, whereas Li et al focus on the publisher's decision about the size of the training data, and use a traditional learning-based redaction approach.…”
Section: Related Workmentioning
confidence: 94%
“…The context of utterance of entities, such as speculation, continues to be explored including in languages other than English such as Chinese [80]. While personal health identifiers are entities which have received sustained interest, research directions in the field of de-identification are switching from entity recognition to revisiting evaluation methods [81][82] and annotation efforts optimizing de-identification efforts [83].…”
Section: Foundational Methods Of Clinical Nlp Take Both Innovative Anmentioning
confidence: 99%