2020
DOI: 10.1145/3377455
|View full text |Cite
|
Sign up to set email alerts
|

Blocking and Filtering Techniques for Entity Resolution

Abstract: Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that correspond to the same real-world object. Due to its inherently quadratic complexity, a series of techniques accelerate it so that it scales to voluminous data. In this survey, we review a large number of relevant works under two different but related frameworks: Blocking and Filtering. The former restricts comparisons to entity pairs that are more likely to match, while the latter identifies quickly entity pairs th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
61
0
3

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 145 publications
(64 citation statements)
references
References 160 publications
(261 reference statements)
0
61
0
3
Order By: Relevance
“…These probabilistic methods are summarized by Herzog et al [38], Winkler [102,103]. Blocking, which is surveyed by Christen [16], Papadakis et al [72,73], is considered an important subtask of entity matching, meant to tackle the quadratic complexity of potential matches. Christophides et al [17] specifically review entity matching techniques in the context of big data.…”
Section: Other Surveys and Extensive Overviewsmentioning
confidence: 99%
“…These probabilistic methods are summarized by Herzog et al [38], Winkler [102,103]. Blocking, which is surveyed by Christen [16], Papadakis et al [72,73], is considered an important subtask of entity matching, meant to tackle the quadratic complexity of potential matches. Christophides et al [17] specifically review entity matching techniques in the context of big data.…”
Section: Other Surveys and Extensive Overviewsmentioning
confidence: 99%
“…We now describe the traditional blocking algorithms that can be considered the standard and most frequently adopted approaches [7,26].…”
Section: State Of the Artmentioning
confidence: 99%
“…Entity matching is generally computationally tricky because the number of possible matches is (| | × | |). Techniques for reducing the potential number of matches to be evaluated are often referred to by the common term blocking, and many effective and efficient techniques have been developed [9]. Even with these techniques, it can still be quite computationally heavy.…”
Section: Background and State Of The Artmentioning
confidence: 99%