2016
DOI: 10.1371/journal.pone.0154446
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Record Linkage Algorithms Using Complete Linkage Clustering

Abstract: Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage prob… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(11 citation statements)
references
References 51 publications
0
11
0
Order By: Relevance
“…Additionally, we must consider issues such as data processing and rules for linking records for a single patient. Many statistical methods have been developed for linking records corresponding to individual patients across data sources, and many of these methods explicitly address issues of privacy . Statistical methods have also been developed for combining data across distributed data sources where data from individual patients are not accessible .…”
Section: Emerging Uses Of Electronic Health Record Data and Combinatimentioning
confidence: 99%
See 1 more Smart Citation
“…Additionally, we must consider issues such as data processing and rules for linking records for a single patient. Many statistical methods have been developed for linking records corresponding to individual patients across data sources, and many of these methods explicitly address issues of privacy . Statistical methods have also been developed for combining data across distributed data sources where data from individual patients are not accessible .…”
Section: Emerging Uses Of Electronic Health Record Data and Combinatimentioning
confidence: 99%
“…Many statistical methods have been developed for linking records corresponding to individual patients across data sources, and many of these methods explicitly address issues of privacy. [223][224][225][226][227] Statistical methods have also been developed for combining data across distributed data sources where data from individual patients are not accessible. 228,229 Yang et al developed methods for performing meta-analysis based on existing GWAS, and similar methods should be developed for PheWAS studies in the future.…”
Section: Emerging Uses Of Electronic Health Record Data and Combinatimentioning
confidence: 99%
“…Many statistical methods have been developed for linking records corresponding to individual subjects across data sources, and many of these methods explicitly address issues of privacy. [237][238][239][240][241] Statistical methods have also been developed for combining data across distributed data sources where data from individual subjects is not accessible, called distributed regression analysis. These methods involve sharing sufficient statistics of the data (functions of the individual-level data) from which the individual-level data are not recoverable.…”
Section: Section 5: Emerging Uses Of Electronic Health Record Data Anmentioning
confidence: 99%
“…The parallel version was tested with a 6 million records synthetic data set, presenting a linear speedup varying from 7.5 to 26.4 (from 8 to 32 cores, respectively). These methods are deeper explained in [11] and a Web-based version, called RLT-S, is discussed and compared to other existing free tools (Febrl and FRIL) in [10].…”
Section: Related Workmentioning
confidence: 99%