2022
DOI: 10.2196/36711
|View full text |Cite
|
Sign up to set email alerts
|

Linking Biomedical Data Warehouse Records With the National Mortality Database in France: Large-scale Matching Algorithm

Abstract: Background Often missing from or uncertain in a biomedical data warehouse (BDW), vital status after discharge is central to the value of a BDW in medical research. The French National Mortality Database (FNMD) offers open-source nominative records of every death. Matching large-scale BDWs records with the FNMD combines multiple challenges: absence of unique common identifiers between the 2 databases, names changing over life, clerical errors, and the exponential growth of the number of comparisons … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 8 publications
0
4
0
Order By: Relevance
“…Projects requiring data management and extraction to integrate a research database are declared to the public registry of CHUN projects as a guarantee of transparency and to allow patient opposition. At this step, more complex methods for the extraction of information through natural language processing (NLP) [ 29 ], regular expression tools, or other structured data [ 30 ] may be applied. Finally, data extraction is constrained to strictly necessary data, following the parsimony principle, and only if access to data can be done in a secure environment.…”
Section: Methodsmentioning
confidence: 99%
“…Projects requiring data management and extraction to integrate a research database are declared to the public registry of CHUN projects as a guarantee of transparency and to allow patient opposition. At this step, more complex methods for the extraction of information through natural language processing (NLP) [ 29 ], regular expression tools, or other structured data [ 30 ] may be applied. Finally, data extraction is constrained to strictly necessary data, following the parsimony principle, and only if access to data can be done in a secure environment.…”
Section: Methodsmentioning
confidence: 99%
“…We measured the performance by computing area under the receiver operating characteristic curve (AUC) and accuracy and the variable importance of each method by using permutations. 9 …”
Section: Methodsmentioning
confidence: 99%
“…‘Big data’ in health and care continues to have serious data quality issues, necessitating extensive cleansing, and often translation between heterogeneous data structures and coding schemes 34. In many health systems, even fundamentals like patient matching between disparate data sets remain problematic 35. Some health services, such as primary care in England, have financial incentive schemes that motivate standardised recording and coding36 but despite this, the practice of clinical coding remains highly variable 37.…”
Section: Sharing Data Wisely Builds Trust and Supports Learning Healt...mentioning
confidence: 99%
“… 34 In many health systems, even fundamentals like patient matching between disparate data sets remain problematic. 35 Some health services, such as primary care in England, have financial incentive schemes that motivate standardised recording and coding 36 but despite this, the practice of clinical coding remains highly variable. 37 This poor data quality is one aspect of the problem of being ‘data rich, but information poor.…”
Section: Sharing Data Wisely Builds Trust and Supports Learning Healt...mentioning
confidence: 99%