2020
DOI: 10.2196/18920
|View full text |Cite
|
Sign up to set email alerts
|

Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model

Abstract: Background The linking of administrative data across agencies provides the capability to investigate many health and social issues with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage of identifiable information on cloud infrastructure assessed as high risk by data custodians. Objective This study aims to present a model for record linkage … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…As Soundex is vulnerable to errors that happen at the prefix of the encoded text, the proposed protocol deploys an optimization to the algorithm by encoding the reverse of the original text with the second phonetic algorithm. Brown et al [22] presented a new hybrid cloud model for PPRL. They used containers to distribute the record linkage workload across multiple nodes.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…As Soundex is vulnerable to errors that happen at the prefix of the encoded text, the proposed protocol deploys an optimization to the algorithm by encoding the reverse of the original text with the second phonetic algorithm. Brown et al [22] presented a new hybrid cloud model for PPRL. They used containers to distribute the record linkage workload across multiple nodes.…”
Section: Related Workmentioning
confidence: 99%
“…AtyImo is implemented over Apache Spark. No blocking or pruning techniques are implemented in [21], [22], [23] except for the last one as Different predicts have been analysed for blocking selection. Chen et al [24] examine the use of Spark-SQL for efficient parallel entity resolution.…”
Section: Related Workmentioning
confidence: 99%
“…When dealing with health care data, in particular, the lack of direct identifiers often means that a privacy-preserving record linkage (PPRL) is required to link the databases [12,13]; this method ensures that no personal data are revealed in the process of combining the datasets. Due to the potential errors and variation in indirect identifiers (e.g., a patient's name which could match as "Elizabeth", "Elisabeth", or "Liz"), probabilistic privacy-preserving linkages, often using Bloom filter encoding [14,15], have shown great success in health care datasets [13,[16][17][18][19]. Deterministic PPRLs, or combinations between probabilistic and deterministic algorithms, have also become more common and have had demonstrated success using healthcare data [20][21][22].…”
Section: Introductionmentioning
confidence: 99%