2014
DOI: 10.1136/amiajnl-2013-002034
|View full text |Cite
|
Sign up to set email alerts
|

Efficient sequential and parallel algorithms for record linkage

Abstract: Background and objectiveIntegrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms.MethodsOur algorithms employ hierarchical clustering algorithms as… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 17 publications
(20 citation statements)
references
References 44 publications
0
20
0
Order By: Relevance
“…We have developed algorithms that can integrate any number of datasets with a high accuracy [2], [3]. [3] describes record linkage algorithms based on single linkage clustering and Levenshtein distance method. Recently, we have extended our work to incorporate complete linkage.…”
Section: Record Linkage Accumulates Records Of Individualsmentioning
confidence: 99%
“…We have developed algorithms that can integrate any number of datasets with a high accuracy [2], [3]. [3] describes record linkage algorithms based on single linkage clustering and Levenshtein distance method. Recently, we have extended our work to incorporate complete linkage.…”
Section: Record Linkage Accumulates Records Of Individualsmentioning
confidence: 99%
“…In [18], parallel data linkage algorithms and performance results obtained with data sets scaled up to 6 million records are discussed. Further, in [19], a Web-based version of these algorithms is compared against Febrl and FRIL.…”
Section: A Data Linkage Toolsmentioning
confidence: 99%
“…Design and the evaluation of sequential and parallel record linkage algorithms are discussed in [12]. One approach implements a pipeline to concatenate, sort and block records, generating a graph of matching pairs.…”
Section: Related Workmentioning
confidence: 99%