2023
DOI: 10.21203/rs.3.rs-3316615/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Automatic Deidentification of French Electronic Health Records: A Cost-Effective Approach Exploiting Distant Supervision and Deep Learning Models

Mohamed El azzouzi,
Gouenou Coatrieux,
Reda Bellafqira
et al.

Abstract: Background: Electronic health records (EHRs) contain valuable information for clinical research; however, the sensitive nature of healthcare data presents security and confidentiality challenges. Deidentification is therefore essential to protect personal data in EHRs and comply with government regulations. Named entity recognition (NER) methods have been proposed to remove personal identifiers, with deep learning-based models achieving better performance. However, manual annotation of training data is time-co… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 39 publications
0
2
0
Order By: Relevance
“…29 , LLMs fine-tuned for medical purposes demonstrated a slightly worse performance (precision of 0.91, a recall of 0.95) in anonymizing medical documents. 27 We are the first to show the high performance of inference from local LLMs’, specifically Llama-2 and −3 models, in extracting PII from medical documents, whereas others achieved only insufficient results: Liu et al tried zero-shot medical text anonymization with GPT-4 and Llama models, but the Llama models failed to generate any relevant anonymization output for tested medical documents with an accuracy of 0.61. GPT-4 demonstrated an superior accuracy of 0.908 for implicit and 0.99 for explicit shot prompting on their synthetic medical dataset.…”
Section: Discussionmentioning
confidence: 92%
See 1 more Smart Citation
“…29 , LLMs fine-tuned for medical purposes demonstrated a slightly worse performance (precision of 0.91, a recall of 0.95) in anonymizing medical documents. 27 We are the first to show the high performance of inference from local LLMs’, specifically Llama-2 and −3 models, in extracting PII from medical documents, whereas others achieved only insufficient results: Liu et al tried zero-shot medical text anonymization with GPT-4 and Llama models, but the Llama models failed to generate any relevant anonymization output for tested medical documents with an accuracy of 0.61. GPT-4 demonstrated an superior accuracy of 0.908 for implicit and 0.99 for explicit shot prompting on their synthetic medical dataset.…”
Section: Discussionmentioning
confidence: 92%
“…Existing NLP NER approaches have demonstrated notable performance for anonymizing medical documents (rule based: recall 0.95, precision 0.93). 27 Recent advancements in transformer models have showcased similar capabilities in various NLP tasks (precision and recall 0.94). 28 LLMs like GPT-4 have proven to possess advanced anonymization skills.…”
Section: Discussionmentioning
confidence: 99%